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INTELLIGENCE TESTS 
BY RUDOLF PINTNER 


Teachers College, Columbia University 


General. This summary covers the year 1930, together with some 
earlier references which have been overiooked in previous annual 
summaries. Hildreth’s (77) book is devoted very largely to the 
problem of intelligence testing in school. She discusses the tests 
available, the administration of individual and group tests and the 
interpretation of test results. Madsen (106) covers educational 
measurement in general in the elementary school and devotes two 
chapters to intelligence testing. Odell (123) covers the same ground 
for the high school and describes seven group intelligence tests suit- 
able for high school work. Brief treatments of the general problem 
of intelligence testing are contained in the books of Moss (118) and 
Inskeep (84). In a work-book for the teaching of measurement in 
general, Park (125) includes good exercises and questions in relation 
to intelligence testing. Pintner (131) gives the usual annual sum- 
mary in this journal with a bibliography of 180 titles. 

Dunlap (52) discusses the meaning and value of intelligence 
testing, criticizing many of the conclusions that have been arrived 
at by some writers. Carmichael (27) discusses the relationship 
between the psychology of learning and the psychology of testing, 
and suggests that tests could be used more in teaching as a starting 
point for future learning. Many different problems are taken up in 
the mimeographed report of the conference on individual differences 
(Anon., 4). By means of a questionnaire McClure (109) investi- 
gates the present status of psychological testing in large city public 
school systems. He finds that 21 out of 86 reported separate 
psychology departments. He lists the various uses to which the tests 
have been put. 

The Meaning of Intelligence. . A general discussion of the mean- 
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ing of intelligence is contained in the book by Filter and Held (54), 
They stress the importance of environment in conditioning the indi- 
vidual and hence in determining his intelligence. Laycock (97) 
criticizes severely the definition of intelligence as “ adaptability to 
new situations,” and after much experimental work comes to the 
conclusion that “the psychological processes involved in adaptability 
to new situations are eductive, reproductive and explorative.” 
Hamilton (69) presents an excellent discussion of the general 
assumptions underlying mental testing. Meili (113) divides general 
intelligence into four kinds. He then constructs tests to measure 
these kinds and by use of Kelley’s method arrives at six group 
factors. These factors are points of view from which to view the 
“ act of intelligence,” but this in itself is always unique. The varying 
amounts of the different kinds of intelligence possessed by any 
individual can be represented on a diagram and this total picture 
gives the form of intelligence. 

Relation of Intelligence to Other Factors. Strang (152) finds 
that social intelligence measured by the George Washington Social 
Intelligence Test correlates .44 with general intelligence for 311 
graduate students, and similarly Broom (20) finds a correlation of 
.58 with the Thorndike Intelligence Test for 646 college freshmen. 
A correlation of +.48 between M.A. and social intelligence is 
reported by Scudder and Raubenheimer (141). In many different 
tests McFarland (111) finds a general factor of speed and this cor- 
relates high with ability on the tests. 

The relationship between intelligence and physique is thoroughly 
examined in the notable contribution by Paterson (126). This book 
brings together and evaluates all previous work in this field. He 
finds a low positive correlation between intelligence and _ height, 
weight, head measurements, anatomical age, and morphologic index. 
There is no correlation with pubescence and dental development. No 
measurable influence on intellect is caused by malnutrition, diseased 
tonsils, adenoids, defective teeth or hookworm. Mental development 
continues independently of physical factors and diseases, except those 
diseases which directly attack the nervous system. The book is also 
valuable for its critique of the methodology of investigation of the 
relationship between intelligence and physical factors. A shorter 
account of these same findings is given by Paterson (127) in another 
report. Stoke (150) reports correlations between I.Q, and height 
+.20; and weight +.25; and anatomic index +.09. Cattell (32) 
reports a thorough study of dentition and intelligence. She finds that 
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these two factors develop independently when chronological age is 
kept constant. The correlation by age groups ranges from +.05 to 
+.12. For 500 children from five to twelve, the correlation with 
C.A. constant is +.11. Similarly she finds small positive correla- 
tions between M.A. and anthropometric measures, and M.A. and 
anatomic index when C.A. is held constant. Abt et al.(1) from a 
study of the records of 1,000 white children at the Institute for 
Juvenile Research in Chicago find correlations of —.41 for boys 
and —.39 for girls between I.Q. and age of talking, and —.36 for 
boys and —.37 for girls for age of walking. The average I.Q. of 
these cases is about 81. Blonsky (14) measures the alkalinity of the 
saliva. For twelve pairs equated for age and sex, he finds that girls 
with high alkalinity have an 1.Q. of 82.1 as contrasted with 73.7 for 
low alkalinity, and for boys the average I.0.s are 90.0 and 74.2 
respectively. High alkalinity goes with high I.Q. 

Three reports deal with the relationship between intelligence and 
reflex conduction rates. Travis and Young (159), working with 
university students and children find no correlation, thus reversing 
the previous findings of the senior author who reported a high posi- 
tive correlation. Travis and Dorsey (158) find no differences in the 
reflex times of feebleminded and superior children having an average 
L.Q. of 119. Whitehorn et al.(168) working with 13 feebleminded 
and 13 normal subjects find a correlation of +.37 between M.A. and 
speed of the knee jerk reflex, but this drops to +.15 when stature is 
held constant. 

With reference to motor ability, Seashore (142) finds that the 
separate tests correlate from +.23 to —.33, while the whole battery 
correlates —.14 with intelligence among 50 students. Reaction time 
correlates very low with the Thorndike intelligence test for 253 
students, according to Livesay and Louttit (101). The correlations 
do not differentiate the four racial groups into which the students 
are divided. 

The Minnesota Mechanical Ability Tests devised by Paterson 
and Elliot (128) correlate only -+.13 with intelligence and the 
authors conclude that mechanical ability is a unique trait. 
Crockett’s (43) measure of manual ability correlates +.14 with intel- 
ligence for 87 cases. Scudder and Raubenheimer (141) find corre- 
lations between M.A. and the McQuarrie Test of +.05 and the 
O’Rourke Mechanical Test of +-.26. 

Various correlations between intelligence and miscellaneous tests 
have been reported. Intelligence and copying geometrical figures 
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correlates -+-.18 for 180 cases in Grades IV to VI, according to 
Bousfield (17). Intelligence and ability to solve the Decroly puzzle 
box correlates +.52 for 183 backward children according to 
Rosenthal-Weiss and Rosenthal (138). The median of nine cor- 
relations between mirror-drawing ability and intelligence is +-.01 for 
various groups of subjects according to Clinton (37). The correla- 
tion between the Meier-Seashore Art Judgment Test and intelligence 
ranges from +.28 to —.14 for six groups of high school and univer- 
sity students according to Meier and Seashore (112). Spence and 
Townsend (144) find that ten students scoring high on the Thurstone 
Intelligence Test do much better on the finger maze than ten students 
scoring low in intelligence. The new type examination correlates 
+-.52 with Army Alpha scores, the old essay type examination only 
+.28, for 102 students in educational psychology reported by 
Corey (40). 

The relationship between intelligence and birth order in the 
family is investigated by Steckel (146) on 6,790 cases. There is no 
difference between first, second, third, etc., when no allowance is 
made for poorer intelligence of larger families. If, however, siblings 
only are compared, there is a decided rise in mean intelligence from 
first up to eighth born. Blonsky (15) finds a decided relationship 
between I.Q. and month of birth. Children born in the spring months 
have higher 1.Q.s. This is due to the better air, sunshine and food 
obtained by the infant during the spring and summer months. 
Sutherland (153) continues his studies of the relationship between 
I.Q. and size of family. He compares fatherless children with non- 
fatherless of the same age, sex and number of siblings. The LQ.s 
decrease with size of family for both groups, but the correlation for 
the fatherless groups is lower (—.19) than the correlations for the 
controls (—.23 and —.26), because the natural size of the family 
has been limited by the death of the father. 

Several reports deal with the relationship between personality, 
character, emotional factors and intelligence. Hartshorne, May and 
Shuttleworth (71) find positive correlations between intelligence and 
honesty, foresight and total character integration, although such cor- 
relations are in general not high. Broom (21) finds no correlation 
between intelligence and scores on the Allport A-S Test for 200 
college students. Intelligence and temperament as measured by the 
Downey Test correlates +.21 for about 25 boys, according to 
Oates (122). Contrasting the extreme cases, i.e., those deviating 
by more than one sigma, on the Woodworth-Cady questionnaire, 
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Keys and Whiteside (91) find that the emotional are 18 points lower 
in 1.Q. and two years lower in M.A. and E.A. Similarly Hirsch (80) 
finds that the more intelligent show less emotional instability as meas- 
ured by the Woodworth-Mathews questionnaire. The correlations 
with intelligence are —.25 for boys, and —.14 for girls. Weber (165) 
constructs a scale to measure emotional age and finds a correlation 
between emotional age and M.A. of +.42 for 231 cases in grades IV 
to VII. Olson (124) reports correlations between intelligence and 
ratings on his rating scale for problem tendencies. The average of 
12 correlations for separate age groups is —.27. Preschool children 
are scored for various traits by Goodenough (64) and these scores 
are correlated with M.A. The highest correlation is +.7 for socia- 
bility and the r’s go down to —.3 for compliance. Rodgers (136) 
measures self-appraisal by having students mark the items of an 
intelligence test as to whether they think they know their answer is 
correct or not. Highest mental scores show greatest self-appraisal. 

Growth and Constancy of Growth. Studying the results of six 
semi-annual tests of the same 183 children, Jordan (85) finds the 
growth curves for ages eight to fourteen to be straight up to about 
year ten or eleven and then to show negative acceleration, which he 
attributes to the poverty of their environment. He finds no tendency 
for the high and low groups to diverge. Keen (89) finds a curvilinear 
growth curve culminating about age nineteen. Williams (172) 
applies the Thurstone method of scaling to data on the Goodenough 
Drawing Tests and finds a curve showing definite negative accelera- 
tion. Rogers (137) finds gains in retests on the Thorndike Intelli- 
gence Test given to college students over a period of three years and 
argues for a growth of intelligence. Oates (121) studies the increase 
in score on vavious tests for boys, ages eleven to eighteen, and finds 
that motor abilities mature relatively early whereas intelligence 
matures later. He analyzes the activities involved in intelligence tests 
from the point of view of the functional maturity of the processes 
underlying them. Sorensen (143) finds a negative correlation 
between age and learning ability for a group of adults who had not 
studied for a long time, but no such negative correlation for two 
other groups of adults who had continued learning. Hence, he con- 
cludes, there is no real decline in ability to learn due to age. There 
is merely a rustiness due to lack of practice. Conrad (39) analyzes 
the sub-tests of Army Alpha with adults, and finds that scores decline 
as age increases on all except the information test. Here scores 
increase with age. He suggests an information test for testing adult 
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intelligence, where a rise in score would indicate a decline in 
intelligence. 

Foran (55) summarizes the results of re-tests of the Stanford- 
Binet from 1926 to 1929. The re-test correlation varies from .80 to 
.95, and the P.E. is about 5 LQ. points. Hirsch (80) gives results 
for about 300 children having about six tests each over a period of 
five years. The average yearly change in I1.Q. is 5.3. The average 
of six re-test correlations is .863. Brown (24) reports results for 
707 problem children with two or more Binet examinations for each. 
The average amount of change in I.Q. points is 5.79 and the corre- 
lation of first with second test is .88. He also shows that the time 
interval between re-tests up to four years does not influence the 
correlation, and that the 1.0.s of the feebleminded (those below 
I.Q. 60) are least likely to change. Baldwin et al.(9) report re-test 
coefficients of .75 for 55 boys and .84 for 48 girls in one-room rural 
schools tested one year apart ; and also coefficients of .90 for 67 boys 
and .80 for 94 girls in consolidated schools. Cuff (45) finds a re-test 
correlation of .98 for 144 children in Grade I on the Herring-Binet 
after an interval of twenty-four hours. The re-test coefficients for 
superior children after six years range from .77 to .81, according to 
Terman (156). Hetzer and Jenschke (76) report on re-tests of 24 
infants after three to fourteen months. When grouped in three 
groups, advanced, normal and retarded, only two cases change their 
grouping in the second test. Garrison (60) repeats the Yerkes Point 
Scale on 73 college students after an interval of ten years. His 
correlations between the 1916 and 1926 scores are .58 for 32 men 
and .76 for 41 women, thus showing fairly consistent results for a 
highly selected group over a long interval of time. Chipman (35) 
analyzes the re-tests of 1,751 cases in a feebleminded institution. No 
change greater than plus or minus five points is shown by 79 per 
cent, a loss of five or more points by 12 per cent, a gain of five or 
more points by 9 per cent. He also shows that in the calculation of 
I.Q.s the use of 14 instead of 16 causes a much greater change im 
I.Q. from test to re-test. Wheeler (167) shows a decreasing gain 
in mental age from year six to year ten for dull children tested over 
a period of four years. There is not much change in LQ. 
Valentine (162) gives the results for one child tested six times by 
the Gesell Tests and three times by the Binet. The I.Q.s are very 
constant. The average Gesell 1.Q. is 136 and the average Binet 
1.Q. is 150. 

Influences Upon Intelligence Ratings. Freeman (56) gives @ 
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general summary of the influences of environment upon intelligence. 
He discusses the different points of view. Thinking is a process 
of organization and hence education can improve it and improve 
intelligence. Jarrett and Koch(10) study seventeen carefully 
matched pairs of preschool children. Those having had nursery 
school training from six to nine months show a gain of 20.9 LQ. 
points, while those with no nursery school training gain only 5.1 I.Q. 
points. Cattell and Gaudet (33) find that the average I.Q. of various 
groups increases with repeated tests, and this increase is attributed 
to practice effect. Leahy and Fox (98) investigate the influence of 
emotion on the 1.Q. A group of cases showing an observed emo- 
tional state (not an emotional breakdown) on the first test is com- 
pared with a non-emotional group. No significant change in LQ. 
from first to second test is shown by either group. The re-test 
coefficient for the non-emotional group is .85 and for the emotional 
eroup .93, hence the authors conclude that an emotional state does not 
lower or raise the probable L.O. 

Carroll and Hollingworth (29) find that the Herring-Binet rates 
children lower than the Stanford-Binet. The average decrease in 
1.Q. for 80 cases ranging in 1.Q. from 133 to 190 on the Stanford- 
Binet is 17.2 1.Q. points. Fifty-two of these 80 cases tested a year 
later show an average decrease of 19.2 1.0. points. By comparing the 
two Binet tests with the Stanford Achievement, they conclude that 
the Stanford 1.Q0.s are the more valid. Steckel (145) finds that the 
[.Q. is influenced by the test used. With over ten thousand cases in 
grades I to XII tested on the Kuhlmann-Anderson, the N.I.T., and 
the Otis S.A. the author has prepared percentile tables with equivalent 
[.0.s for each test for each percentile. Cattell (30) also finds dif- 
ferences in 1.Q.s from different tests. She gives the median differ- 
ences in 1.0. between the Binet and eight group tests. Kuhl- 
mann (95) finds that degree of difficulty influences the intelligence 
score. Difficulty determines the amount of effort put forth by the 
subject. If a test is too easy, it becomes a speed test. No given test 
battery can give equally good results for three successive grades. 

Individual Scales and Group Tests. Three new scales and two 
new group tests have recently appeared. A Point Scale of Perform- 
ance Tests has been constructed by Arthur (6). This consists of ten 
tests in Form I and eight tests in Form IJ. The norms for Form II 
are re-test norms. The scores on each test are turned into points 
and then into M.A.s. The Stanford Revision has been adapted by 
Hayes (74) for use with the blind. Twelve new tests have been 
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substituted for tests unsuitable for the blind and minor changes have 
been made in others. The scale has been standardized on blind sub- 
jects. Linfert and Hierholzer (100) have published a scale for the 
first year of life. Series I consists of tests for one to four months; 
Series II for six to twelve months. A total score for each series 
gives an approximate M.A. The tests are largely based on the work 
of Gesell. It has been standardized on 300 infants, 50 at each age 
group. 

Pintner (132) has constructed a non-language group test for 
Kindergarten, Grades I and II. It is given by the use of pantomime 
and samples on the blackboard. Cattell (34) has published a group 
intelligence test in England. It consists of three scales: Scale I for 
ages eight to eleven; scale II for ages eleven to fifteen; scale III for 
age fifteen up. There are two forms for each scale. The test con- 
tains the usual abstract verbal material. 

Schieffelin and Schwesinger (140) give a description of 186 non- 
verbal tests. They give useful information as to publisher, price, 
bibliography, etc. MacPhee and Brown (105) report results for 134 
cases on the Ferguson Form Boards. The re-test coefficient is .90. 
The scores show no significant age increase and the authors conclude 
that age norms are meaningless on these boards. Brown (22) reports 
results for 154 cases on the Kohs Block Design Test. He finds a 
correlation of .46 with N.I.T. mental ages. He does not believe the 
test has any clinical value. Norden (120) describes her plans for a 
new German revision of the Binet. She describes briefly the tests she 
proposes to use, which consist mainly of the original Binet tests and 
others added by various workers to other Binet revisions. As yet 
the scale has not been standardized. 

The Elementary School Pupil. St. John (149) gives the distri- 
bution of composite I.Q.s obtained from several tests for 503 boys 
and 455 girls. The correlation with school marks is .44. There is 
more maladjustment in school among boys than among girls. He 
reports case studies of those showing great disparity between LQ. 
and school achievement. Several reports deal with rural children. 
Baldwin et al.(9) study rural children in Iowa. They find no differ- 
ence in intelligence between city and rural preschool children in 72 
matched cases. On the Detroit Kindergarten Test the rural children 
are inferior to the urban. Rural school groups vary in I.Q.; one 
group having a mean I.Q. of 92.5, and another 103. The average 
I.Q. of 235 one-room school children is 91.7 as contrasted with a 
mean I.Q. of 99.4 for 425 consolidated school children. 
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Hatcher (72) gives a distribution of I1.0.s of 87 Virginia mountain 
children, showing a range from 46 to 114 with a median at 83. 
Ludeman and McAnelly (103) report results of 32 children living 
in a religious colony with a very restricted environment. The mean 
1.Q. for 13 cases in Grades II to IV on the Kingsbury is 72.6 and on 
the Myers 72.1. The mean 1.Q. for 19 cases in Grades V to VIII on 
the N.I.T. is 66.3 and on the Myers 68.8. The authors conclude that 
these low LQ.s are due to the very limited environment. 
Russell (139) in England reports results with the Northumberland 
Test in an eastern county. Among the county children (n = 2958) 
he finds 4.8 per cent having 1.Q.s of 121 up; among the borough 
children this percentage rises to 7.5. The median I.Q. of the agri- 
cultural group is lower than that of the mining group. The more 
isolated the rural region, in general, the lower is the median I.Q. 
Hauck (73) in Germany finds children of industrial regions superior 
to those in agricultural. 

The High School Pupil. Woody and Bergman (179) report 
results for seniors from many high schools. They give the corre- 
lations between intelligence on the Otis Test and achievement on the 
Iowa H. S. Content Examination. These correlations range from 
.37 to .50 for the separate subjects, with a correlation of .70 for the 
total examination for 800 cases. The average intelligence of seniors 
taking various courses is given, ranging from scientific, college pre- 
paratory down to agricultural, vocational. In the junior high school 
Marzolf (108) finds that 1.Q. on the Terman Group Test correlates 
66 with school marks. Hardie (70) reports resulis for 761 secondary 
school pupils in England. Intelligence correlates .37 with English 
and .28 with Arithmetic. Gardner and Hilton (57) compare 98 part- 
time children with 300 junior high school children in a rural district 
in Utah and find a mean I.Q. of 86.5 for the former and 94 for the 
latter. Stedman (147) finds a correlation of .55 between marks in 
bookkeeping and scores on the Terman Group Test. He believes that 
those with I.0.s below 80 should not take up bookkeeping. In 
selecting children for high school scholarships Candee (26) found 
the 1.Q. most significant. None below 90 received a scholarship and 
only one above 120 was rejected. Symonds (154) gives correlations 
between intelligence and other tests and marks in school subjects. 
He finds no good prediction from the various tests for specific courses 
in high school. Weisman (166) finds the Stanford-Binet helpful in 
educational counseling. He believes that high school work can be 
done by pupils with I.Q.s as low as 75, but repetition and a longer 
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time is required by those with I1.Q.s below 100. Pyle (134) makes a 
study of 33 high school failures. He compares them with four suc- 
cessful students on various tests. In general the failures are below 
the age norms. He finds it absurd to talk of an LQ. There are 
as many I.Q.s as there are functions to be tested. Turney (161) dis- 
cusses factors other than intelligence which affect success in high 
school. Marks and M.A. correlate from .57 to .75 for four high 
school classes. He contrasts achievers and non-achievers. The 
former are those whose marks are one sigma above their intelligence 
rating and the latter are those whose intelligence is one sigma above 
their school marks. 

The Private School Pupil. Three reports deal with private 
schools. Dearborn and Cattell (48) find a median I1.Q. of 119 for 
1,295 children in 12 private schools in the Boston area. They con- 
trast this with a median I.Q. of 103 for 3,623 public school children. 
For the private schools Q, is 109 and Q, is 128; for the public schools 
Q, is 91 and Q, is 114. The private school median is well above Q, 
for the public schools. The Educational Records Bureau (Anon., 5) 
report a median I.Q. of 113.6 for 11,272 private school children and 
QO, is 105.3 and Q, is 121.4. The range is from 66 to 179. Only 
3 per cent of the cases fall below I.Q. 90 and 29 per cent of the cases 
are above I.Q. 120. Private schools are in general very superior, but 
the medians of separate schools range from 99.5 to 124.3. 

The College Student. Edgerton (53) reports an average corre- 
lation of .52 between intelligence and college marks. He gives 
numerous correlations between intelligence and marks for various 
colleges. He finds that the correlation between intelligence and marks 
can be raised from .52 to .98 by the cumulative addition of each quar- 
ter’s scholastic record from the first to the sixth quarter. The C.E.B. 
Scholastic Aptitude Test correlates from .46 to .49 with freshman 
marks at Yale, according to Crawford (42). A multiple R of .74 
is obtained by using intelligence score, school records and age at 
entrance. Lefever (99) finds a correlation of .29 between the 
Thorndike Test and marks for 884 freshmen at the University of 
Southern California. He also gives the correlations between intelli- 
gence and marks in each subject. These correlations range from .10 
for Economics 4 to .49 for Pharmacy. Garrett (58) reports a cor- 
relation of .41 between CA VD scores and college marks for 314 
freshmen, and he gives separate correlations for each subject. The 
highest is .57 with English. Kaulfers (88) finds the mean LQ. of 
students taking Spanish to be lower than the mean I.Q. of students 











































INTELLIGENCE TESTS 103 


taking French or German in the freshman year in college. Summer 
students make a higher score in intelligence tests than do winter 
students at Syracuse University, according to Keys and Reed (90), 
and the variability of the summer students is much greater. Lloyd- 
Jones (102) discusses the use of intelligence tests in student person- 
nel work at Northwestern University. 

There are six reports dealing with intelligence testing at teachers 
colleges and normal schools. Krieger (94) reports correlations of 
46 between intelligence and first semester marks; .44 between intelli- 
gence and second semester marks; .535 between intelligence and both 
semesters’ marks. She also gives correlations between intelligence 
scores and various courses, the highest being with education .63; the 
lowest with fine arts .31. Whitney and Leuenberger (170) report 
correlations of from .48 to .50 between intelligence and marks for 907 
freshmen. Student mortality is heaviest in the lower deciles of intelli- 
gence. Garrison (59) and Cuff (44) report correlations between 
various intelligence, character and learning tests given to groups of 
students. Wagenhorst (163) finds no correlation for 191 teachers 
between intelligence or scholarship and success during the first year 
of service, and the correlation between ratings in practice teaching 
and the first year of service is only .23. Whitney and Frasier (169) 
summarize previous reports on the relationship between intelligence 
and student teaching success. The average of seven correlations is 
+.16 with a range from —.03 to +.35. Their own results show a 
correlation of +.24 for a group of 100 cases, and +.22 for another 
group of 70 cases. 

The Superior. There are many reports dealing with the child of 
high 1.0. Terman (156) follows up his superior group and reports 
the results after six years. For 54 cases the mean I1.Q. shows a 
decrease from 148 in 1921 to 139 in 1927, and most of this decrease 
is caused by the girls. For the older children the results cannot be 
stated in 1.0. These older cases are in the 97th to 99th percentiles 
of the Terman Group Test. The superior cases who are now students 
in Stanford University have a mean score on the Thorndike Test 
which is above the university mean. In general the majority of the 
cases are at approximately the same level of intelligence now as they 
were when tested six years ago. Duff (51) reports a follow-up of 
73 cases with 1.Q.s above 135 in 1921-22 on the Northumberland 
Test in England. He compares them with a control group of I.Q.s 
between 95 and 105. The superior send more replies to his inquiry. 
More of them have gone on to higher schools. A comparison of 
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seven control children now in secondary schools with thirteen of the 
superior group who have not gone to secondary schools shows the 
superior better in spelling and language. The mean size of family 
of the superior is 3.5, of the control 4.1. 

Witty (175) reports results on one hundred gifted children with 
1.Q.s above 140. The social status of the group is high and the 
ancestry is mainly English, Scotch, German, and Jewish. The mean 
I.Q. is 152 and the mean E.Q. 136. The gifted are much above the 
controls in two character tests. After five years he finds the same 
superiority of the gifted in physical characteristics, but the Terman 
Group I.Q.s are lower than the Binet I.Q.s of five years ago. The 
correlation between these I.Q.s from different tests five years apart 
is .66. Stoke and Lehman (151) go over the studies of superior 
children and point out that those of high social status provide a 
smaller number of superior children. Less than one-quarter of the 
children with I.Q.s above 120 came from the professional classes, 

Lamson (96) reports a follow-up of Hollingworth’s gifted group 
of 56 cases. Most of them are now in high school. On Army Alpha 
all have A ratings, all are in the top decile of high school students 
and 53.6 per cent have already reached the top centile of adults in 
general. These gifted children are significantly superior to their 
comrades in scholastic achievement, although their chronological age 
is on the average two years less. The gifted excel the control in 
extra-curricular activities, and obtain better teachers’ ratings in con- 
duct. They have not suffered in health and 75 per cent say they are 
glad to have been accelerated in their school progress. Gerberich (63) 
finds that the superior high school student, who is in the upper decile 
in intelligence, graduates earlier than usual from high school and 
enters college early. The percentage of the gifted entering college 
is 68.8 as compared with 35 to 40 per cent for all high school grad- 
uates. In this gifted group 59 per cent are boys and 41 per cent girls. 
Students who enter college before age sixteen are studied by 
Gray (66). The average age for 126 boys and 28 girls in this group 
at Columbia and Barnard is 15.5. Their mean scores on the Thorn- 
dike Test are above the means for the whole student body. These 
superior students show slight superiority to other students in academic 
marks, but they took less time to graduate and gained more academic 
honors. They took part in more extra-curricular activities and did 
not seem to suffer in any way from their earlier college entrance. 

Hollingworth (81) reports the annual measurements in height 
from 1923 to 1929 inclusive for 47 gifted children with 1.Q.s above 
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130. They remain constantly about 5 per cent taller than unselected 
children. Hollingworth and Gray (82) find no correlation between 
the relative or absolute height or weight and the A.Q. for 50 gifted 
children. Inferiority in size is not a spur to greater accomplishment 
within a group of high 1.0.s working together in the same class. 

The Feebleminded. ‘Town and Hill (157) make an elaborate 
study of all the cases sent back to Erie County from the state school 
for the feebleminded. These 136 cases were supposed to be fitted 
for life outside the institution. They find that only 14 per cent have 
made a fair adaptation. They believe that 63.4 per cent are absolute 
economic failures. Complete case records are given of every case. 
Willhite et al.(171) report that there are 3,637 feebleminded in the 
State of South Dakota, and only 500 of these are in institutions. 
The incidence is about 0.5 per cent of the total population. The ratio 
of male to female is 59.1 to 40.9. Among 2,050 children, ages six 
to seventeen, 1.01 per cent are morons, 0.23 per cent are imbeciles 
and 0.06 per cent are idiots. Martz (107) reports that 10 out of 25 
children born of mothers of low I.Q. were found to be of average 
intelligence. The children’s 1.O.s are higher on the average than 
those of their mothers. Chotzen (36) makes an extensive survey of 
children in special classes in Breslau, particularly from the physical 
standpoint. He gives the distribution according to degree of intelli- 
gence based on a general diagnosis not wholly on intelligence tests. 
He then finds that the mean Binet I.0.s of 400 cases are for those not 
feebieminded 94, for borderline 87, for morons 77, for imbeciles 61, 
for idiots 33. Bieber (13) takes the Binet definitions, similarities, 
differences and the like, as a basis for further questioning and con- 
versation with feebleminded children. He uses this material for a 
discussion of the thinking of feebleminded children. 

Delinquent, Dependent and Problem Cases. Adler (2) gives the 
percentages of 1,120 penitentiary prisoners making various ratings 
on the Army Alpha and says that these compare favorably with the 
army draft. In a state school for boys the mean I.Q. of 338 Cook 
County boys (Chicago) is 82 as contrasted with an I.Q. of 76 for 
435 “down state” boys. The I.Q. distribution of 369 cases in a 
juvenile detention home is given and 35 per cent are rated feeble- 
minded with I.Q.s below 70. Willhite et al.(171) in South Dakota 
give the intelligence distribution of 586 male and 21 female peni- 
tentiary prisoners: 11.8 per cent of the men and 19 per cent of 
the women are rated as morons or lower. Of 3,164 girls tested 
at the Women’s Protective Association at Cleveland, Derby (50) 
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reports only 47 as ranking superior in intelligence with an LQ. 
above 110. This is less than 1 per cent as contrasted with a normal 
expectation of 20 per cent. McClure and Goldberg (110) find the 
mean I1.Q. of 84 unmarried mothers to be 77.1 with a range from 
38 to 109 and a mean C.A. of 18.4. Caldwell (25) finds that 65 
per cent of 408 boys and 78 per cent of 252 girls in industrial schools 
rate below I.Q. 85, whereas the normal expectation is only 11 per cent, 

Paynter and Blanchard (129) report on 330 behavior cases at 
child guidance clinics, having excluded al! cases with 1.0.s below 80, 
The mean I1.Q. of these 330 cases so selected is about 100, 
Riley (135) reports the results of giving the Binet, the Arthur Per- 
formance and the Minnesota Mechanical Ability Tests to 65 pro- 
bation boys. The average M.A. is a year higher on the performance 
as contrasted with the Binet Scale. Coleman (38) finds no differ- 
ence in intelligence between 125 problem and 125 non-problem high 
school boys. 

One report on dependent children by Davis (46) gives the LQ. 
distributions for 1,051 cases in children’s homes on two group tests. 
The percentage estimated as feebleminded is from 15 to 17; the per- 
centage having I.Q.s above 110 is 5.3 on the Dearborn Test and 
10.2 on the Haggerty Test. 

The Deaf and Blind. Brown (23) finds a correlation of .80 
between the Pintner Non-Language Test and the Arthur Perform- 
ance Scale for 333 deaf pupils. This drops to .61 with C.A. constant. 
Both intelligence tests correlate about .40 with arithmetic score on 
the Stanford Achievement, but zero with reading score. Peterson 
and Williams (130) give a distribution of the Goodenough Drawing 
I.Q.s for 330 deaf children, ages four to thirteen. The mean LQ. is 
79.5, and 25.4 per cent are below 70 1.Q. Beilinsson (12) describes 
a special test of lip-reading ability. Hard of hearing children are 
investigated by Sterling and Bell (148). Their report on the intelli- 
gence of 585 cases gives only “above,” “at” or “below average 
I.Q.” The percentages of hard of hearing (having nine or more 
units loss) are 0.6, 1.6, and 3.7 in the above three categories, as com- 
pared with the normal in hearing whose percentages are 74.2, 717 
and 60.8. 

Merry (115) reports the first attempt to test a deaf-blind case. 
The Binet Tests are not suitable. Certain performance tests can be 
given. The time records are much longer than usual. Hayes (74) 
finds that the blind are 10 points in 1.Q. below the seeing on tests 
which are suitable to both groups. In Myers’(119) survey of sight- 
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saving Classes in the United States he gives a distribution of the 
1.Q.’s for 709 cases tested by means of intelligence tests. Of these 
cases 58.9 were below 90 1.0. and 9.4 above 110 LQ. 
Racial Comparisons. Garth (61) sums up the intelligence testing 
of the past five years. He presents a very useful table of results and 
a bibliography of 176 titles. Witty and Lehman (177) discuss in 
general the results of intelligence testing among different racial 
groups. They conclude that none of the differences are certain and 
that they are not innate. A bibliography of 67 titles is given. 
Three studies deal with negroes. Garth (62) presents results 
for 2,006 southern negro children in Grades IV to IX, ages six to 
twenty, tested by the Otis Classification Test. The average I.Q. is 
76 or 78. The percentage reaching the white median ranges from 
32 at age nine to 1 at age seventeen. The average E.QO. is 77 and the 
average A.Q. is 103. He argues for the great influence of educa- 
tional opportunity on the 1.0.. Hurlock (83) finds white children 
slightly better than negroes on the Otis test of suggestibility. The 
average 1.0. of 194 white children is 102 as compared with an 1.Q. 
of 93 for 210 colored children. Graham (65) compares negro and 
white college students on many tests. The mean superiority of the 
whites on all tests is +.51 Q. The overlap of the negro on the 
white in the rational learning test is 36 or 37 per cent. The 
superiority of the white groups in separate intelligence tests is as 
follows: Army Alpha +.57 Q; Myers Mental Measure +.86 Q; 
Otis Higher S.A. +.68 Q. 
Two studies give a comparison of English and American school 
children on American intelligence tests. Wood (178) compares 1,260 
English children with the norms for American private schools. The 
mean 1.0. of the English is 114.4, of the American 114.1. The LQ. 
distributions of the two groups are very much alike. Powers (133) 
reports results for 253 English secondary school pupils on the Terman 
Group Test. The mean score of these cases with an average C.A. of 
15.8 is much above the norm for American children in Grade XII 
with a C.A. of 18. These English pupils score higher than childrer 
in American private schools. 
Winch (174) compares Christian and Jewish children in a London 
East-end elementary school on his reasoning test. The Jews are 
superior to Christians equated for age and social status. The Jewish 
boys surpass the Christian boys to a greater extent than the Jewish 
girls surpass the Christian girls. Delmet (49) gives the results of 
various intelligence tests on Mexican children. The average retarda- 
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tion by grades of the Mexican children as compared with the author’s 
norms varies from 4 months to 1 year and 2 months. 

Employment and Guidance. Beckham (11) tabulates from 
previous workers the occupations suitable for mental levels 5 to 12 
inclusive. At year eight he finds a considerable amount of responsi- 
bility, and at years ten to twelve much responsibility and supervision. 
He reports his own results for 20 laundry employees and gives the 
average M.A. according to supervisors’ ratings of their work. There 
are two studies on nurses in training. Metcalfe (116) reports the 
Army Alpha ratings for 331 nurses who graduated. The percentages 
of A or B ratings for the successful nurses is 87.3, for the failures 
the percentage is 44. Intelligence and theory grades correlate +-.40; 
intelligence and practice grades -+-.14. MacPhail (104) reports 
results on the Brown University Test with probationers. Of those 
scoring less than 45 points, 8 out of 10 fail. The correlation with 
academic grades is +.76 for 35 probationers. For 32 juniors this 
correlation is +-.50. 

Taylor (155) gives I.Q.s for young printers. Compositors’ 
apprentices and continuation school students average about 92 or 93 
1.Q. points, but the average of pressmen’s apprentices is only 67. 
Treat (160) deals with girls below 1.Q. 70, and finds a correlation of 
+.36 between M.A. and garment machine operation. In training for 
power machine operating an M.A. over 8 is necessary, and in addition 
emotional stability and a certain measured ability on mechanical tests. 

Sex Differences. Broom (19) compares 600 boys and 600 girls 
in Grades VII to IX on the Terman Group Test. The mean scores 
vary by a few points in favor of the boys but the difference is not 
statistically reliable. He also reports results for each sub-test. In 
general he finds no real sex difference. Hardie (70) in England 
reports results for 761 secondary school pupils, ages eleven to fifteen 
on an intelligence test. The mean score of the boys is exactly the 
same as the mean score of the girls, but the boys have a slightly 
larger S.D. Similarly Brolyer (18) finds that girls have smaller 
sigmas than boys on the C.E.B, Scholastic Aptitude Test. The girls 
are significantly superior to the boys on the verbal part of the test, 
while the boys are superior to the girls on the mathematical part. 

Meltzer and Bailor (114) find no difference in Otis S.A. score 
between 32 women and 34 men college students. Wallin (164) 
reports results for many cases referred to psychological clinics for 
entrance to special classes. The I.Q. distribution for 1,019 cases in 
St. Louis gives a median I.Q. of 70.5 for boys and 63.8 for girls. 

















































INTELLIGENCE TESTS 109 


Twice as many boys are sent for examination. For another group of 
3,644 cases, he finds a median 1.Q. of 74.4 for boys and 71.8 for 
girls. The percentage of boys below I.Q. 35 is 1.0; of girls 0.9. 
Similar results are reported for a third group of 1,114 cases. In 
general there are fewer girls than boys; the median I.Q. for girls is 
lower; more girls fall below I.Q. 70; but more boys fall below 
1.Q. 35. Witty (176), however, finds that girls test slightly higher 
than boys, having a median I.Q. of 98.1 as compared with 97.1, for 
1,049 clinic cases, made up of 585 boys and 464 girls. The range 
of I.Q.s is from 32 to 178. 

Inheritance. Schieffelin and Schwesinger (140) give a survey of 
the main work on the inheritance of intelligence with many biblio- 
graphical references. Davis (47) finds the following correlations 
for various pairs of orphan children: .41 for 320 sibling pairs; .03 
for 100 unrelated pairs; .77 for 23 twins. For non-orphan children 
he reports correlations of .52 for 106 sibling pairs, and .11 for 100 
unrelated pairs. For sibling pairs in the orphanage for 0 to 3 years 
the r is .51; for 4 to 6 years the r is .34; for 7 to 9 years the r is .48. 
For unrelated pairs in the orphanage for these three periods of time 
the correlations are .06, .07, .06 respectively. 

Hirsch (79) reports on 58 pairs of dissimilar twins living 
together, 38 pairs of similar twins living together and 12 pairs of 
similar twins living apart. Detailed data are given for all cases. 
The average differences in 1.Q. for twins having similar environ- 
ment is for dissimilar pairs 13.8 points in 1.Q., and for similar twins 
2.3 points. Thus the dissimilar twins show six times as much differ- 
ence in 1.Q. as do the similar in spite of the similar environment and 
hence such differences must be largely due to inheritance. The cor- 
relation for 38 pairs of selected similar twins is .97, for 58 pairs of 
dissimilar twins living together the correlation is only .53. He con- 
cludes that heredity is about five times as significant as environment 
in determining I.Q. differences. Bakwin (8) reports on 20 pairs 
of identical twins and finds the 1.Q.s to be similar for all pairs 
except one. 

Stoke (150) investigates social status and intelligence for 508 
cases, ages six to eleven. The correlation between parents’ occu- 
pational level and I.Q. is +.30. He points out that the number (not 
percentage) of children of high 1.Q. (above 110) is important. The 
number of high I.Q. cases contributed by the high ranking occupa- 
tional levels is very small, because these levels are sparsely represented 
in the total population. Dealing with high school seniors in many 
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schools Woody and Rergman (179) give the average intelligence 
rating in terms of S.D. for occupational levels of fathers. These 
range from +.24 sigma for “ professional” to —.28 sigma for 
“farming.” In this ranking “business” falls slightly below 
“artisans.” Linfert and Hierholzer (100) find an average corre- 
lation of only -+-.06 for different groups of infants between socio- 
economic status and intelligence. But Witty (176), reporting on 458 
pre-school cases, ages three to six, finds a correlation of +.48 between 
M.A. and Barr rating of fathers. 

Miscellaneous. Two studies deal with the only child. Guilford 
and Worcester (67) compare 21 only with 141 not-only children in 
Grade VIII. The average I.Q. of the only is 108; of the not-only 
103. The only are superior in occupational status, school marks, 
health habits and other personal characteristics. Blonsky (16) com- 
pares 33 only with not-only children in Grade I. He finds their 
average 1.Q. to be six points above the average of not-only cases. 
Almost 50 per cent of the only children are above +1 sigma in M.A. 
of the total distribution. He discusses their character qualities and 
points out his disagreement with Fenton’s findings in the United 
States. 

Hauck (73) finds that bi-lingualism has an inhibiting influence 
on mental development in his comparison of Upper Silesian with 
other German children. Allen (3) investigates the effect upon an 
individual produced by knowledge of his own intellectual level. In 
general he finds it to have no influence on test or college achievement. 
Haefner (68) finds no difference in intelligence between left-handed 
and right-handed children. Babcock (7) presents a method of meas- 
uring the amount of mental deterioration. Hetzer (75) discusses 
various cases of infants tested by means of her “ Babytests”. 
Retarded physical development is not compensated for by advanced 
mental development. Kovarsky (93) gives examples of Rossolimo 
profiles, and Cattell (31) finds that Otis I1.Q.s are not comparable 
to Binet I1.Q.s. 

Hilleboe (78) gives a survey of all types of special classes and 
the methods used for the selection of children for such classes. 
Knight and Manuel (92) find that children who enter school at 
age six surpass those who enter at age seven in their high school 
course. They give no intelligence tests, but estimate 1.Q.s from 
parental occupation and conclude that the superiority of the younger 
group is not due to intelligence. Wilson (173) contrasts bright and 
dull children in the learning of a motor memory task, and Car- 
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roll (28) does the same for learning to spell. Kaulfers (86) finds 
that achievement in Spanish depends upon intelligence in addition 
to previous work, and the same author (87) finds that teachers’ 
guessing of the probable foreign language ability of pupils correlates 
as high with final marks as do measures of intelligence. Coy (41) 
discusses various factors that influence the A.Q. and Morley (117) 
finds that the reliability of the A.Q. decreases with increase of 
correlation between mental and educational tests. 
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I. GENERAL 


The most important trends which have been noted during the 
calendar year under review are, first, the ever-growing tendency for 
testing materials and methods to be incorporated in textbooks and 
supplementary teaching materials, and, secondly, the development of 
tests in the field of appreciations and attitudes. These two points will 
be discussed in some detail later, the former in section III and the 
latter in section II. 

Three general textbooks on educational measurement have ap- 
peared during the year. Hildreth (50) has prepared an up-to-date 
treatment of the application of measurement and other psychological 
methods to the study of individual pupils in the schools, The book is 
addressed especially to school psychologists and to teachers and 
supervisors who have had considerable experience with testing tech- 
niques. Probably no other work in educational measurement shows 
so clearly the trend away from the survey emphasis in measurement 
and in the direction of individual diagnosis and adjustment. 

Madsen (81) has prepared a textbook which specializes on the 
problems of measurement in the elementary school. In this book 
the author has been very successful in relating the facts of measure- 
ment to the broad problems faced by teachers and supervisors in the 
lower school. The relation which the testing movement bears to 
the improvement of methods and materials of instruction is pointed 
out in connection with each of the school subjects. Also the big 
problem of the relationship between measurement and the objectives 
of instruction comes in for consideration here and there. In no 
one section is the issue between the realists and the platitudinists in 
educational philosophy debated, but throughout the book one is made 
conscious of the fact that the measurement movement is definitely 
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allied with that educational philosophy which assumes that no one 
can be sure that objectives are being achieved until these objectives 
are stated in terms of responses which it is desired that the learner 
make and these responses are measured. 

The third textbook which appeared during the year is that by 
Russell (122). The major emphasis in this book is upon the tech- 
niques of handling test results. It gives a very detailed account of 
the use of age, grade, and T-scores in solving certain school 
problems. 

In addition to the general texts, two work-books have appeared. 
Greene (44) has prepared a work-book which is designed to give 
practice in the basic techniques of measurement. The topics covered 
parallel pretty closely those covered in the textbook by Greene and 
Jorgensen which was published in 1929. Park (103) has constructed 
a set of problems to guide students in the mastery of the basic 
principles and techniques of measurement. Most of the material 
covered is connected with the traditional problems of survey testing. 

The usual annual review of the developments in the field of edu- 
cational tests was prepared for the PsycHoLocicaL BULLETIN by 
Jones and Crook (62). The review was based on 162 articles and 
books. 

Several studies have been made to determine the degree to which 
measurement is spreading to the college level. Four chapters in the 
Eighteenth Yearbook of the National Society of College Teachers 
of Education are devoted to this problem. A chapter by Rauben- 
heimer and Touton (113) reports the degree to which standardized 
tests are used in higher institutions. Questionnaires were sent to 
308 colleges, and of the 159 replying to the question about the use 
of achievement tests 131 stated that one or more such tests were being 
used. The test which was being used with greatest frequency was 
the lowa Placement Examination. Hudelson (56) has scanned the 
literature for data bearing on the evaluation of teaching in institu- 
tions of higher learning. He finds little or no use being made of 
any objective measurements by supervisory officers to determine the 
eficiency of instruction. Haefner (46) hunts for evidences that 
quantitative measurement has influenced curriculum construction in 
colleges and professional schools, but he finds little to report at this 
level. A chapter by Manuel (83) is directed to the study of the atti- 
tudes of 108 professors of education toward the application of 
measurement techniques at the college level. It was found that 
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the so-called new-type tests were meeting with much favor, and that 
appreciable interest is being taken in prognostic testing. 

Trinidad (147) has written a brief discussion of the use of 
standardized tests in normal schools. A comprehensive study of the 
colleges of the Lutheran Church in America has been made by 
Leonard, Evenden, and O’Rear (73). Of the 13 institutions ques- 
tioned with respect to the use of standardized tests, one reports the 
use of achievement tests while two use intelligence tests. 

The general problem of the relation of measurement to the aims 
in education—which was mentioned above in connection with Mad- 
sen’s book—has been discussed in an article by Barr (2). He says 
that the two most important tendencies in education today, that of 
the development of measurement and that of the growing interest 
in non-informational aims, are in conflict inasmuch as the attention 
in measurement has been so largely centered on the testing of in- 
formation. No one can deny that specialists in measurement must 
ever be alert to new objectives which are evolved as the school 
attempts to meet the changing demands of civilization. However, 
in view of the development in the last few years of measures of atti- 
tudes and appreciations, it seems that the conflict which Barr men- 
tions is less serious than he indicates. In one series alone, that of 
Thurstone (144), there are scales already available for the measure- 
ment of attitudes toward God, toward the Church, toward war, 
toward the Negro, and toward birth control; and scales have been 
planned to cover attitudes on 26 other problems. Moreover, in 
the past year four tests in the field of appreciation have been devised. 
These will be described briefly in the next section under new 
survey tests. 


Il. Tue DEVELOPMENT AND UsE or TESTS FoR SURVEY AND 
EXPERIMENTAL PURPOSES 


(a) Use of Tests in Studies of a Survey Nature. A com- 
parison between the achievement of pupils in the schools today with 
that of pupils in the schools of bygone days has been the subject of 
three interesting studies. One of these studies was conducted by 
Fish (35). A complete set of examinations given in 1853 was found 
together with a tabulation of the scores made at that time of 20 
pupils. The examination was designed as an entrance examination 
to high school and was administered at the end of “ nine pre-high- 
school grades.” In 1929 this examination was given again in the 
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same city to 200 pupils selected at random from the eighth 
grade. The scoring was done by one individual and unfortunately 
no facts are given as to the reliability of the grading. However, 
since the examination consisted exclusively of factual questions, 
many requiring only one-word answers, it seems probable that the 
reliabilities of the averages were fairly high. The results taken as a 
whole are decidedly favorable to the 1929 group. The average 
number of errors made by the 1853 group was 16.2, whereas the 
average number made by the 1929 group was only 8.9. The greatest 
differences appeared in arithmetic where the former group made on 
the average 5.4 errors, while the latter made 1.6. In grammar the 
averages were 6.5 as compared with 3.1; and in geography 4.4 
as compared with 4.2.1 Another study of this same character— 
though covering a much shorter interval—was conducted by 
Tyler (149). The cooperation of various research workers in the 
schools of Ohio was obtained in unearthing old tests and administer- 
ing them to comparable groups of present-day students. Data thus 
obtained on the differences in achievement were almost uniformly in 
favor of the present-day groups. Some of the differences were large 
enough to be considered statistically reliable. Remmers (114) com- 
pared the results obtained by the 1919 and 1929 freshmen in 
engineering at Purdue University on an identical test. He found that 
the 1929 group exceeded the other by a significant amount in four 
out of five subjects tested. 

Such studies, of course, cannot be taken as conclusive evidence 
that the schools today are using their facilities to better advantage 
than the schools of the earlier day, because too many variables are 
left uncontrolled. However, the superior performance of the pupils 
today on a test like Fish’s, which was based on the objectives and 


1 It is worthy of note in passing that these findings agree very closely with 
those obtained in an almost identical study made by John L. Riley in Springfield, 
Massachusetts, in 1905 and published in the Springfield Republican for 
November 12, 1905. In this study a set of examinations given in 1846, covering 
arithmetic, spelling, and geography, was repeated after a period of 49 years. 
About 80 pupils of the ninth grade were tested originally and about 220 in the 
corresponding grade were tested at the later date. The papers were scored by 
the same method in the two cases, and the results reported in percentage values. 
In each of the three subjects tested the average percentage score of the 1905 
group exceeded that of the 1846 group; in arithmetic the average difference 


between the two percentage scores was 36, in spelling it was 11, and in geography 
it was 13. 
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curriculum of the days of the children’s grandfathers, does speak 
well for their breadth of knowledge and skill. 

A wide variety of testing was done for surveys and descriptive 
purposes. Gerberich (41) reports the results obtained in the Iowa 
high-school testing program over a period of five years. He finds 
that 68 per cent of the pupils in the highest decile of high-school 
graduates enter college, whereas only 35 or 40 per cent of all gradu- 
ates enter. Eells (32) reports the results from a mental-educational 
study of 11,000 junior college students. The boys exceeded the girls 
in all subjects except English. Woody (164) reports a large amount 
of data collected in a testing program in the elementary and high 
schools of Michigan. Stalnaker (133) presents some facts coming 
out of the orientation testing program at Purdue University. Carreon 
and others (13, 14, 15, 135) report the standing of the children of 
the schools of the Philippine Islands on tests in they three R’s and 
home economics. Dearborn and Cattell (28) investigate the ac- 
complishment quotients of the children in three private schools. They 
find the median A.Q. to be up to 100 in only one out of the three 
schools. The most extensive testing of the purely survey type which 
is being done by any one organization is that in connection with the 
so-called Annual Nation-wide Testing Program which is being put 
on by the Public School Publishing Company. Torgersen (145) 
summarizes the results from the Seventh Program, in which 225,000 
pupils in 46 states were tested on the Public School Achievement 





Tests. 

Caswell (18) has made a careful! study of school surveys, and 
he notes that the trend of emphasis in the measurement sections of 
such surveys is definitely in the direction of individual diagnosis and 
treatment of pupils rather than in the direction of standardization. 

(b) Use of Tests in Experiments. Burks, Jensen, and Ter- 
man (11) report a follow-up study of the achievement of gifted 
children. A large group of children with intelligence quotients of 
140 and above were selected for study in 1922. In 1928 an attempt 
was made to discover if these children were maintaining in school 
achievement a lead over the average run of pupils which was con- 
sistent with their intellectual gifts. On this examination, which 
covers English, mathematics, science, and history, the average score 
of the gifted girls was between 1.0 and 1.5 control o’s above the 
norm, while the average score of the boys was between 1.5 and 2.0 
control o’s above the norm. In terms of percentiles the average 
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score of the gifted group on this test was 92, which means that only 
eight of the children out of 100 exceed the average score made by 
this group. The superiority of this group in achievement over the 
general run of pupils in high school cannot be doubted in light of 
these results. However, it is impossible to say whether or not these 
pupils are achieving at as high a level as is consistent with their 
1.0.’s found six years earlier.’ 

Jensen and Jensen (61) review the literature on the influence 
of class size upon pupil achievement, and find the results conflicting. 
In an experiment of their own, in which they confine their attention 
to high-school algebra, they find significant differences in favor of 
small classes. Bloomfield and Brooks (4) also studied this problem 
at the high-school level, but, contrary to the findings of Jensen and 
Jensen, they do not find significant differences between small classes 
and large ones. 

Holy and Sutton (55) report an experiment on the value of 
homogeneous grouping for algebra instruction. Two large sections 
equal in size and in ability were used. One class was made up of 
children as nearly homogeneous in intelligence and algebra achieve- 
ment as possible; the other was heterogeneous. The experiment ran 
for 17 weeks. At the end of this time the homogeneous group ex- 
ceeded the other on several standardized tests, but the differences 
were not large enough to be statistically reliable. Tharp (142) 
studied the problem of sectioning in Romance Languages by means 
of test results. He found that sectioning could be done quite ac- 
curately by means of the Foreign Language Test of the Iowa Place- 
ment Examination. His results show that the inferior students 
profited most from sectioning. Miller and Henry (86) here reviewed 


1 Since the lowest I.Q. in the group was 140, we should expect only about 
4 children in 1,000 to equal the poorest of the group in general intelligence, and 
yet 8 pupils in 100 in the senior class in high school are exceeding the average 
of the gifted group. The authors do not attempt to say whether the selection 
in high school, which of course serves to boost the norms, is enough to account 
for this difference or not. But from the following calculation made by the 
writers it seems doubtful. According to F. M. Phillips (A Graphic View of 
Our Schools, Houghton Mifflin, 1927) out of 10,000 pupils who enter the fifth 
grade, 1,390 graduate from high school. On the basis of the known distribution 
of intelligence scores we should expect only 40 of these to exceed 140 L.Q., 
which is the lowest score of the gifted group, but this study reports that in 
achievement 111.2 (i.e., 8 per cent of 1,390) exceed the average score of the 
gifted. 
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20 studies which have been conducted in the last 10 years on 
homogeneous grouping. 

Smith (130) conducted a study to determine how accurately 
teachers judged the difficulty of tasks required of school children, 
Items from the Stanford Achievement Test were chosen as materials 
for the teachers to rate. The correlations between the true difficulty 
of the material and the teachers’ judgments ranged from .45 to .86, 

Scott (124) compared the educational achievement of native-born 
white children and Mexican children in the schools of El Paso, 
Texas. The Mexican children were found to be inferior in reading 
and arithmetic by a significant amount at most of the age levels 
covered. 

Hollingworth and Gray (53) conducted an experiment with 50 
superior children to test the Adlerian hypothesis that physically in- 
ferior individuals compensate for their inferiority by exercising 
special effort along other lines. It was assumed that if the 
hypothesis is correct then accomplishment quotients should cor- 
relate negatively with size. However, when the children were 
arranged in order of size from largest to smallest, there was no 
tendency found for the children in the lowest quintile to be different ’ 
from those in the upper quintile in A.Q. when intelligence was 
allowed for. The experimenters conclude that whatever may be “ the 
inner urges arising from perception of physical smallness, they do 
not in this sample exert any appreciable effect upon measurable 
scholastic performance.” In this connection the work of Adams (1), 
who studied the relations existing among physique, intelligence, and 
proficiency in school subjects, is interesting. He found a low nega- 
tive correlation between size and achievement when intelligence was 
partialed out. 

Keys and Whiteside (64) found that among children of the same 
age, sex, and intelligence those markedly inferior in emotional 
stability were also distinctly inferior to educational achievement. 
Hildreth (51) reports some interesting and important data on chil- 
dren’s growth in achievement from Grades II to VIII, as measured 
by the Stanford Achievement Test. 

Stone (137) has experimented with his own practice tests in arith- 
metic to determine their effectiveness as an aid in teaching. He con- 
cludes that the use of tests produced a greater gain in ability than 
the regular work in arithmetic. The greatest gains were made by 
pupils with the highest intelligence quotients. 
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(c) New Tests of a Survey Nature. The most interesting and 
probably the most significant development in the field of new tests 
this year has been the attempt to measure attitudes and appreciation. 
The notable work of Thurstone (144) in the measurement of social 
attitudes has already been referred to. Moore (87) has experimnented 
with a test designed to measure scientific attitudes in certain science 
work. By means of this test he attempts to study the important prob- 
lem of the relation between scientific attitudes and factual knowledge. 
His results, on the face of them, indicate that there is a close rela- 
tion between the two, but it is not certain that the measures of 
scientific attitudes used were wholly valid. 

McAdory (78, 79) describes the construction of a rather elaborate 
art appreciation test (71;=.87).' Speer (132) has prepared some 
interesting materials for the measurement of appreciation of poetry 
and prose (r4;==.67—.78). Hevner (49) reports preliminary results 
on a test for appreciation in music. The test consists of 14 items 
from the classics, each of which is presented in original form and 
in three variations. Vernon (151) proposes a method for measuring 
musical taste (ry4;==.85). 

Another new extension of educational tests is in connection with 
the measurement of the knowledge of the pre-school child. Buck- 
ingham and MacLatchy (10) have constructed a test consisting of six 
parts to be used in the study of the knowledge of number concepts 
possessed by children when they enter the first grade. The authors 
find, for example, that 90 per cent of the children examined could 
count to 10, and that 78 per cent of them could count out and hand 
to the teacher eight objects. Many other detailed facts of this type 
are given which may serve as norms. Sangren (123) has devised 
a rather comprehensive information test for use at the pre-school 
level and in primary grades. It covers vocabulary, number, nature 
study, social and civic information, literature information, and house- 
hold information. Norms are given. The correlations of different 


1 Due to limited space only a very brief mention can be made of most tests. 
Abbreviations will, therefore, be adopted in reporting reliability and validity 
coefficients. r,, will stand for the reliability of a test, and r,, will stand for 
the coefficient of correlation between the test and some outside criterion. Of 
course it will be impossible to interpret these correlations with any great 
accuracy without knowledge of the range in age or grade upon which they 
were based. However, in the space available only one or two statistical facts 
can be given for each test, and these coefficients are more significant than any 
other single figures which the authors report. 
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parts with mental age range from .46 to .73. Rockwell (117) dis- 
cusses the use of the Cleveland Kindergarten-Achievement Test. 

In the regular subject-matter fields a great number of new tests 
have appeared during the year. A tentative scale for rating prose and 
poetry of young gifted writers has been devised by Jensen and re- 
ported in a book by Burks, Jensen, and Terman (11). The scale 
consists of graded specimens ranging from the level of average per- 
formance of tenth-graders up to a level represented by the average 
work of some of the best writers in the English language (ry, that is, 
3 judges against 3 judges = .77). Otis and Orleans (101) have 
started a series of annual graduation examinations for use in 
the last grade of the elementary school. No data on norms, relia- 
bility, or validity will be available for these tests until they have 
already been used over the country and the results reported back to 
the authors. A new test will be made each year. Educationally, it 
is hard to see the advantage of such a test over the many standardized 
tests with established norms and reliabilities. The Public School 
Achievement Test (146), covering eight subjects of the elementary 
school has appeared. Grade norms are given (ry for separate sub- 
jects ranges from .80 to .96). A mew series of tests have been 
added to the Iowa Placement Examination (59). The tests in this 
battery appear to be among the best in the field. 

Siebert and Wood (127) have prepared an aural French test for 
use at high-school and college levels (ru=.97). Clapp and 
Young (22) have devised a test to measure grammar, capitalization, 
punctuation, and word forms (r;,;—.85). The same authors (21) 
have devised an arithmetic test (ru=.79). A unique feature of 
these tests by Clapp and Young is the ingenious self-marking device 
which is employed. Hartley (47) has made a test which is designed 
to measure children’s ability to interpret poetry (r=.85-.95; 
T19—=.80). Poley (108) has devised a test to measure high-school 
students’ ability to get the gist of what they read (r:=.80). Nelson 
and Denny (95) have published a reading test which measures word 
knowledge and ability to undefstand paragraphs. Percentile norms 
are given for high-school and college levels (ry:=.91). A test de- 
signed to measure the word knowledge of Italian children in the 
primary grades of the American schools has been prepared by 
Hill (52). Pressey and Pressey (111) have prepared a word read- 
ing test for use in the first grade (riu=.93). Breslich (6,7) has 
produced an algebra test and a geometry test. Tentative norms are 
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available. Presson (112) has prepared a biology test. Percentile 
scores are given (ry:==.83—.92; rig=.79-.84). Persing (107) has 
prepared a test for laboratory knowledge and techniques in chemistry 
(ru=-87). A test on the United States Constitution has been de- 
signed by Bear (3). Grade norms are given (ry;=.96). Tyler (150) 
has prepared a test which purports to measure ability to generalize in 
general science. Cuff (27) has devised a new vocabulary test 
(ru==-87). Woody (167) discusses the development and use of a 
vocabulary test to measure the growth in students’ vocabulary during 
periods of study of Latin and French. DeMay and McCall (30) have 
prepared a brief survey test in fractions to accompany their Standard 
Test Lessons in Fractions. The survey test is designed to identify 
early those pupils who should receive diagnostic and remedial 
treatment. 

Wood and Lerrigo (163) have designed an important test of 
health knowledge and habits for use over the whole school range. 
It consists of three parts: the healthy organism, the healthy per- 
sonality, and the healthy home and community. 

Engle and Stenquist (34) have published a home economics test 
covering foods and cookery, clothing and textiles, and household man- 
agement. Complete age and grade norms are given (71;—.90 and 
above; 71c for different parts ranges from .52 to .90). Two 
mechanical drawing tests have appeared: one by Nash and Van 
Duzee (93) and the other by Wells and Laubach (155). Mc- 
Laughlin (80) has devised a test of shorthand (ric=.73). 
Mann (82) has prepared a test to measure certain aspects of 
engineering education. 


Ill. Tae Use AND DEVELOPMENT oF TESTS IN DIAGNOSIS AND 
REMEDIAL TEACHING 


(a) General Trends in the Development and Application of 
Diagnostic and Practice Tests. As was stated in the first section of 
this article, one of the most important trends in educational testing 
today is the trend toward the incorporating of test material and 
methods in the construction of textbooks, practice exercise books and 
other teaching aids. This trend is well illustrated in a discussion by 
Gates (40) of the advantages of modern practice tests over old- 
fashioned drill methods for the elimination of specific defects. This 
trend is even more strongly emphasized by the appearance in many 
fields of practice exercise books, unit tests, and the like. The follow- 
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ing is a partial list of such material which appeared during the year: 
Strayer, Mort, and Dransfield (138), check tests to accompany a 
supplementary geography text; Webster (152), oral tests in English; 
Smith, Reeve, and Morss (128), exercises and tests in algebra; 
Goddard (42), practice exercises in algebra; Witchcraft (162), prac- 
tice exercises for arithmetic; Wrentmore (168), practice exercises 
for grammar and language usage; Smith (129), practice exercises jin 
physics; Hyde (58), unit tests in American history. 

Probably the most challenging article of the year so far as 
technical trends within the field are concerned is one by Greene and 
Buswell (43) on diagnostic methods in arithmetic. These writers 
make a distinction between group and individual diagnosis, and indi- 
cate that most of the so-called diagnostic tests available today are 
good for group diagnosis but that they are relatively useless for 
individual diagnosis. Their main contention is that the chief func- 
tion of individual diagnostic testing is to supplement the discovery of 
types of errors by the discovery of causes of errors, and they main- 
tain that the only way to discover these causes is to observe the child’s 
method of work as he attacks problem after problem. In other words, 
it appears that they deny the possibility of satisfactorily diagnosing 
a pupil’s difficulties by any test whatsoever; the test, according to 
their scheme, would be used only as a device for eliciting responses, 
but the main part of the diagnoses would be the careful observation 
of each step taken by the child as revealed by such overt behavior as 
thinking-out-loud, hesitating, erasing, etc. In the minds of the re- 
viewers, there is no doubt that pupils’ final answers to problems in 
diagnostic tests do not always reveal the exact strengths and the 
weaknesses in their knowledge, and that in some cases the study of a 
child’s method of attack upon a problem is an excellent procedure, 
but there are many cases where the tests, for all practical purposes, 
exclude one possibility after another so that an analysis of the 
problems missed will serve so to limit the field in which the real 
difficulty lies that remedial teaching can be directed toward the weak- 
ness with rather great precision. The authors have made out an 
excellent case for the method in which they are most interested, but 
it appears that it would be unfortunate if all the development in indi- 
vidual diagnostic testing were bent in the direction of this one 
method. 

(b) Experiments in the Use of Diagnostic Tests and Remedial 
Procedures. Pressey (109) reviews the experiments which have been 
conducted on the use of diagnostic testing and remedial teaching at 
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the college level. Pressey and Pressey (110) describe an experi- 
ment in the use of the special reading exercises with college fresh- 
men. The lowest fourth of freshman classes was given this special 
training, and marked improvement was noted both in reading and in 
general academic work. Lyon (77) describes the use of the Pressey 
Diagnostic English Test and other materials in a study designed to 
improve the English of freshmen in a School of Agriculture in a 
state university. 

Leonard (71) describes the use of diagnostic tests and practice 
exercises in an experiment in English. In general the results seem 
to favor the use of such material though it is impossible to be sure, 
since adequate controls were not employed. Woody (165) describes 
an experiment designed to determine the efficacy of the use of read- 
ing drills for the improvement of ability in problem solving in arith- 
metic. Results favorable to the use of such exercises were found. 
Rosse (119) conducted an experiment in which the Lennes Test and 
Practice Sheets were used in one group in arithmetic while another 
group was held as a control. The experiment ran for three months, 
and at the end of this time the experimental group exceeded the 
control group by an amount which was 2.4 times the probable error 
of the difference. 

(c) New Diagnostic Tests. Symonds and Daringer (141), con- 
tinuing their studies in English expression, have published an analysis 
of common errors in sentence structure. The Bureau of Reference, 
Research and Statistics of the Board of Education of New York 
City issues at intervals bulletins for classroom teachers on diagnostic 
methods. The diagnostic tests in arithmetic (96) and in map read- 
ing (97) which have recently been devised and issued in bulletin 
form by this Bureau are especially suggestive. Leonard (72) has 
published a diagnostic test in punctuation and capitalization. It is 
designed for use in all grades from the fifth through high school. 

A score card for rating teachers and diagnosing their strengths 
and weaknesses has been worked out by Carrigan (16). It consists 
of two parts, one to be used after a supervising visit of around 45 
minutes; the other to be used after longer contacts. 


IV. DEVELOPMENT AND Use or TESTS FoR PROGNOSIS AND 
GUIDANCE 


(a) Prediction of Success in Higher Education. The most 


important recent development in this field is the wide use of aptitude 
tests in the study of applicants for admission to medical schools. In 
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the 1928-1929 session 1,552 medical and premedical students were 
tested on the Moss Scholastic Aptitude Test for Medical Schools, 
and in the 1929-1930 session 5,916 were tested. A committee was 
appointed in 1930 by the Association of American Medical Colleges 
to make further experimental study of aptitude tests. Moss (90, 91) 
finds the correlation between his test and first-year grades in medical 
colleges to be .59, and between his test and second-year grades to be 
.54. Not a single failure was found in freshmen or in sophomore 
years of students who stood in the upper decile on the test, whereas 
56 per cent of those in the lowest decile failed. He finds pre- 
medical grades to correlate .50 with first-year grades. Personal 
interviews were aiso studied as a means of predicting success, but they 
were found to be very unreliable. His general conclusion is that a 
combination of aptitude test scores and pre-medical grades (with 
the possible addition of the results of interviews in special cases) 
represents the best means for predicting success in medical school 
work. 

Stoddard (136) has surveyed the literature up to 1930 in con- 
nection with the problem of the degree to which college success can 
be predicted from test scores and school marks. Crawford (25) pre- 
sents data on the prediction of freshman grades at Yale. The 
¢xaminations of the College Entrance Examination Board were 
found to correlate with first-year grades to the extent of .44. When 
various other available data, including high-school marks, are com- 
bined with the examination scores, the correlation is raised to .70. 
Nelson (94) reports that the results obtained on the Iowa State 
Teachers College Test at the beginning of the year correlates to the 
extent of .56 with first-term English grades in that college. Cor- 
relations between results on other tests and English grades are also 
given, but these are lower, ranging from .33 to .41. Hartson (48) 
reports the results of a five-year study of the use of tests for section- 
ing college students in English composition. It was found that a 
battery consisting of three brief tests identified 49 per cent of the 
students who subsequently made low grades in the course. 

(b) Prediction of Success in Teaching. Krieger (66) has made 
an elaborate study of four tests which seemed most promising for 
prediction of success in teaching. Certain new factors are suggested 
for inclusion in such tests. A plea is made that future test-builders 
in this field attempt to predict success in actual classroom instruction 
and management instead of success in professional courses. 
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(c) Prediction of Success in Elementary and High School. 
Hughes (57) reports a study of Latin prognosis in which a large 
number of tests were employed and an extensive array of correla- 
tions were computed. Some of the correlations found are rather 
astonishing. Probably the most interesting fact reported is that the 
results from a battery of four achievement tests and one intelligence 
test given at the beginning of the year correlated to the extent of .90 
with the scores obtained on the New York Latin Achievement Test 
at the end of the year. The battery consisted of the following tests: 
Thorndike Word Knowledge, the New York Sentence Structure, the 
Charters Language, the Charters Grammar, and the Terman Group 
Test of Mental Ability. According to the figures reported, the 
Orleans-Solomon Latin Prognosis Test does not predict final achieve- 
ment in Latin as well as does the Terman Group Test or the Thorn- 
dike Test of Word Knowledge. Ross and Hooks (118) have made 
what appears to be a superficial survey of the literature on the pre- 
diction of high-school success. They conclude that elementary school 
records form the most satisfactory basis for predicting achievement 
in high school. Their conclusion is based more on the consideration 
of convenience of obtaining scores than on the accuracy of prediction. 
The possibility of combining test results with elementary school 
grades for the purpose of improving prediction does not receive due 


consideration. 


d) Prediction of Success in Music and Art. Four articles have 
appeared during the year on problems connected with musical prog- 
nosis. Larson (69) applied the Seashore Measures of Musical Talent 
to high-school orchestras and music classes in which careful ratings 
of ability had already been made. He found that the group which 
was rated the lowest did not consistently show low scores on the 
Seashore test, but the highest groups did get high scores on the test. 
Stanton and Koerth (134) gave the Seashore test to a group of 157 
students at the beginning of their music course and repeated the test 
at the end of the course three years later. The correlations between 
initial and final test results range from .45 to .83 for various parts 
of the test. Nielson (99) reports a study of the Seashore Motor- 
Rhythm Test which is a new departure in the testing of musical 
capacity. Significant correlations were found between this test and 
other measures of musical ability and musical performance. 
Wilson (161) suggests the use of three tests for use in musical prog- 
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nosis: tonic memory, resolution, and score reading. The combined 
score on these three was found to correlate .29 with grades in 
music courses, whereas the Seashore battery correlated .25. In con- 
sidering these correlations it is of interest to note that scores on an 
elaborate questionnaire on musical background correlated with these 
same music grades to the extent of .41. 

Whitford (159) has published a textbook on art education which 
contains rather comprehensive chapters on the art tests now available. 

(e) New Prognostic Tests. Symonds (140) has completed the 
third revision of his foreign language prognosis test. This appears 
to be one of the best tests in this field. The results from two forms 
of the test combined correlate .71 with achievement in foreign 
language one year later. Luria and Orleans (76) have also pub- 
lished a test in this same field. It consists of several language lessons 
and a test with each lesson. It correlates .68 with scores in modern 
language achievement at the end of a year course. The Lee Test 
of Algebraic Ability (70) has appeared. It is still in the 
experimental stage. 

A prognosis test designed to predict teaching ability has been 
devised by Coxe and Orleans (23). By way of validation, the test 
has been correlated against tests of achievement in normal school 
work administered one year later. The correlations range from .53 
to .83. An aptitude test for nursing is being constructed by Moss 
and Hunt (92), but the measure at present is only in the 
experimental stage. 

Crockett (26) presents a new test of complex manual ability 
which purports to predict success in shop work. The correlation be- 
tween the test and performance in various shop tasks is in the 
neighborhood of .60. It correlates only .31 with the Detroit 
Mechanical Aptitude Test. Paterson, Elliott, Anderson, and 
others (104) describe in great detail the standardization of the 
Minnesota Mechanical Ability Tests. Various combinations of the 
tests in the series correlate with success in shop work to the extent 
of from .53 to .73. The reliabilities of various parts of the test are, 
on the whole, higher than for any other test in the field. They range 
from .86 to .94. Strong (139) has devised and standardized a 
vocational interest questionnaire. Norms are given only for men. 
This appears to be the best device of its kind that has been developed. 
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V. DEVELOPMENT AND UsE or Tests For IMPROVING MARKS 
AND MARKING SYSTEMS 


Three books on teachers’ informal examinations have appeared 
this year. Lang (67) has written for teachers and supervising prin- 
cipals a well-balanced book on modern methods of constructing and 
interpreting teachers’ informal examinations. Naturally the book 
contains a discussion of the so-called “ new-type ” examinations, but 
it is more than a mere discussion of these methods. Such practical 
problems for the teachers as the following receive attention: use of 
examinations for motivation in reviews, the methods of improving the 
essay-type test, methods of improving school marks, etc. He recom- 
mends the converting of test scores into school marks on the basis of 
the percentage scheme: highest 6 per cent should receive A’s; next 
25 per cent, B’s; next 38 per cent, C’s; next 25 per cent, D’s; and 
the lowest 6 per cent, E’s. No discussion is given as to what adap- 
tations of this scheme, if any, should be made in cases of special 
selection. 

Michell (84) discusses the use of “ new-type”’ examinations as 
aids to instruction in history. Her two main points are, first, that 
since so much material can be covered in a short testing period such 
tests have diagnostic values; and, second, that testing can be easily 
and frequently done, thus encouraging better study habits and greater 
effort on the part of the pupils. Each of these points is elaborated in 
some detail, and suggestions are given concerning the construction of 
“new-type ” tests to accomplish these objectives. 

Ruch and Rice (120) have published the 36 best examinations 
submitted in a prize contest. This collection represents the greatest 
variety of specimens of “new-type” examinations now available in 
one volume. The examination by E. Riley, entitled “ Working Skill 
Test in Social Science Material” is especially ingenious and would 
seem to be worthy of standardization. In addition to giving the best 
specimens, the authors have made an interesting analysis of all the 
examinations submitted in the contest. They find that 30 per cent 
of the test items were of the completion type, 24 per cent of the true- 
false type, 16 per cent of the multiple-choice type, and 11 per cent 
of the matching type. The remaining 19 per cent were divided up 
among 13 other types. 

Wells (153) reports an examination of the “ new-type” variety 
which has been used with success in testing students’ mastery of 
psychiatry. 
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Brinkmeier and Ruch (9) report an investigation of the degree 
to which the phraseology in true-false test items suggests the proper 
answer. Brinkmeier (8), in another study, finds a tendency for the 
longer statements in a true-false examination to be true. 


VI. InTENsIvE Stupy oF CuRRENT INSTRUMENTS AND TECHNIQUES 


Foran (37) discusses at some length the question of validity of 
present measuring instruments. The point is made that the most 
significant limitation of current tests is the lack of knowledge about 
what the tests measure. He makes the suggestion that improvement 
may come through validation of sub-tests as wholes, as well as the 
validation of individual items. In another article (36), in which he 
studies spelling tests, he concludes that the form which the questions 
take has an important bearing on validity. He finds recognition forms 
in spelling to be relatively unsatisfactory; the “ modified sentence” 
form he finds to be the best. Lindquist (75) criticizes the norms 
offered for many educational tests, particularly at the high-school 
level. He feels that variability is not properly taken into account. 
He suggests that separate norms based on different types of schools 
would be more serviceable in many cases than the norms based on 
such composites as are frequently used at present. Lincoln (74) 
discusses at some length the problem of the equality of units in the 
age, percentile, and standard deviation scaling systems. He makes 
the unusual claim that the units on a percentile scale or an age scale 
are just as nearly equal as those on a standard deviation scale. The 
established view of specialists in statistical methods is at variance 
with this conclusion... An interesting debate between Kelley (63) 
and Wilson (160) has developed concerning the inclusion of many 
words in the Spelling Test of the Stanford Achievement battery 
which are far too difficult for the children in the middle grades to 
spell. The debate served to bring out clearly the advantages and 
disadvantages of steeply graded tests. 

Miller (85) has made a critical analysis of certain parts of the 
Iowa Placement Examination. He concludes, among other things, 
that the best examination for placement in school is one which em- 
phasizes fundamentals, and that the best measure of aptitude for a 


1 See Garrett, H. E., Statistics in Psychology and Education. New York: 
Longmans, Green, 1926, pp. 109-111. 

Also McCall, W. A., How to Experiment in Education. New York: 
Macmillan, 1923, pp. 94 ff. 
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school subject is one which tests the student’s ability to comprehend 
reading material in that subject. Rice (116) has studied various 
leading French and Spanish tests. He points out what he considers 
to be defects in the foreign language section of the Iowa Placement 
Examination. Shank (126) has studied the types of responses which 
students are called upon to make in various reading tests. Foran and 
Rock (38) have investigated the reliability of seven silent reading 
tests. They advocate the use of the ratio between the probable error 
of estimate of a true score and the standard deviation of the dis- 
tribution as the best single measure of reliability." This ratio is 
thoroughly useless for this purpose, however, as can be seen from the 
simple fact that a test having a reliability of .20 will receive precisely 
the same rating in this formula as one having a reliability of .80. 

Two studies on the reliability of the Trabue French Composition 
Scale have appeared. Breed (5) investigated the extent to which 
teachers agree in scoring a given set of papers by means of the scale; 
he finds on the average a correlation of .87 between the scores as- 
signed by two teachers. Ford (39) investigated the number of 
samples of a given child’s work which would be required to measure 
accurately his composition ability. The correlation of one 10-minute 
composition with another 10-minute one was only .51. When the 
average score on four 10-minute compositions was correlated with 
the average of four others the coefficient rose to .81. 

Larson (68) has made a careful item-by-item study of the Sea- 
shore Measures of Musical Talent. New norms have been de- 
termined, but they differ very little from those previously reported. 
New facts on reliability and validity are given. It was discovered 
that most of the tests could be shortened somewhat without appreci- 
able loss in reliability. Coxe and Orleans (24) describe the methods 
and the results of analyzing and revising their Teachers’ Interest Test 
and their Prognosis Test of Teaching Ability. 

Scudder and Raubenheimer (125) have obtained some very in- 
teresting results from a study of the intercorrelations among the 


1 Th , ; “7 ee : 
1 This an adaptation of the ratio of ——_—__"—-_—.,, which was suggested 
O71 
by T. L. Kelley, Statistical Method, p. 215. On an unreliable test the « would 
be greater than on a reliable one, due to errors of measurement being intro- 
duced. However, the ¢ in the numerator and that in the denominator would 
be equally affected by this; therefore the critical part of this formula is 


the value Vn? 
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following measures of mechanical ability: Stenquist test scores, 
MacQuarrie test scores, O’Rourke test scores, and shop teacher 
grades. The intercorrelations range from 0 to .49. These challeng- 
ing facts should lead to further research in this field to determine 
what, if any, unique capacities or achievements are being measured 
by mechanical aptitude tests. Revised median and percentile scores 
have been obtained on the Detroit Mechanical Aptitude Examina- 
tions (31). Wells (154) describes a new procedure for administer- 
ing the O’Conner “ Work sample 17” whereby the reliability of the 
test is increased. 

Peet and Dearborn (106) report revised age and grade norms 
for their survey test in arithmetic. New facts on reliability are 
also given. 7, for the section on problem solving was found to be 
86; ry for the section on fundamental processes was found to be 
95. 

Morley (89) has made a study of the reliability of the accomplish- 
ment quotient. Otis and Orleans (102) have published a manual 
designed to assist teachers in transmuting test scores into terms of 


school marks. 
VII. BrsLioGRAPHIES 
Newland and Toops (98) have compiled a bibliography of 664 


titles bearing on measurement in higher education. Woody (166) 
has prepared a list of tests for use at the college legel. Useful data are 
given concerning each test listed. Kinder and Odell (65) have also 
prepared an annotated list of tests for use in colleges and have 
appended a bibliography of 321 titles bearing upon measurement at 
this level. Odell (100) has published the third revision of his 
annotated list of high-school tests. 
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MEASURES OF CHARACTER AND PERSONALITY 


BY GOODWIN WATSON 
Teachers College, Columbia University 


This review of tests of character and personality aspires to be 
more selective than extensive. In the endeavor to focus the review 
and to introduce perspective into it, certain related articles have 
been eliminated. Discussions of the nature of character and analyses 
of particular traits have been eliminated from this review. Such 
indices of character and personality as crime, delinquency, scholastic 
failure, being tattooed, being elected to offices, insanity, truancy, 
reference to problem clinics, socio-economic status, living room 
equipment, and so on, although studied in one or more articles during 
1930, have not been included in this review. A very large number of 
articles involving glandular disorders have been excluded, although 
such observations have an important significance for personality. 
Measures of appreciation or creative ability in the arts have been 
excluded. Studies of religion have been excluded unless specific 
attention was given to measuring character correlates. The large 
number of ratings and tests on vocational aptitude, success in teach- 
ing, and the like have been eliminated. Studies of motivation in 
animals, while certainly involving the principles of character measure- 
ment, have been excluded. Discussions of typology, constitution, and 
eidetic personality, have for the most part been excluded, although 
mention is here made of the unusual interest in articles by 
Krasusky (86) on Kretschmer’s constitutional types in 1,100 chil- 
dren, by Zillig (171) and Thomas (147) on Jaensch’s three types. 
Measures of group life and social phenomena have not been in- 
cluded, although McCormick’s scale for measuring social ade- 
quacy (103,104) must be singled out for this special mention. The 
major exclusion was the total field of individual case analyses, from 
which most of the deductions about personality adjustment are being 
drawn. These related areas which are not quite close enough to the 
center of the problem for inclusion in this review may suggest needed 
supplementary summaries. 

Six related summaries were published during 1930. May, Harts- 
horne and Welty (111) published their review in this magazine a 
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year ago. Allport and Vernon (4) reviewed the field of personality, 
covering 327 titles. Bain (10) reviewed the theory and measure. 
ment of attitudes and opinions, a matter of some 260 studies, 
Boven (17) covered 50 contributions to characterology. Guilford 
and Braley (57) managed to gather more hope than this reviewer 
can from the 53 conflicting studies relating to extraversion and 
introversion. Raines (126) offered nearly 300 titles in his review 
of emotions. 

The titles of studies reviewed in this summary will be classified 
under the following headings: 


1. Conduct Measures 6. Knowledge and Ability Tests 
2. Behavior Observation 7. Attitude and Opinion Tests 
3. Characterological Indices 8. Interest Reports 

4. Laboratory Tests 9. Self-Description 

5. Physiological Tests 10. Reputation Measures 


11. Combinations and Batteries. 


Within each section attempt is made to group studies of a kind 
together, presenting first and most prominently, the most valuable 
studies. 


I. Conpuct MEASURES 


The outstanding contribution in this field was the publication by 
Association Press of some thirty tests used by Hartshorne and May 
in the Character Education Inquiry. Many of these are conduct 
tests, notably the measurement of honesty by the Self-Scoring In- 
telligence and Achievement Tests (160), Attitudes S-A Test (160), 
Codrdination Test (160), Self-Scoring Speed Test (160), some 
activities related to a Stunt Party (160), an Athletic Contest (160) 
and a series of puzzles (160). Tests of service and codperation in- 
cluded the Maller test (160) and the Kits, Envelope, and Money 
Votes enterprise (160). Measures of persistence include the Maller 
tests (160), Stories and Puzzles Test (160). Ability to resist dis- 
traction is shown in the Ruggles test (160), the Speed Test I (160), 
Stunt Party (160), Stories and Puzzles and Safe tests (160). Am 
other simple and excellent test of cheating, developed by Maller, has 
been made available (109). 

Conduct tests adapted to particular situations have been tried 
out, most of them in connection with some form of cheating. 
Steiner (136) showed that his seventh graders did not cheat quite 
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as much as his fifth graders. Zillig (170) studied 270 public school 
children and 64 boys in an orphanage by tests involving difficult 
problems on which the class falsely reported success, the taking of 
articles loaned to the children, and unjustified bragging about their 
possessions, ability, and parents. The findings agreed generally with 
the previous work in this country; that desire to succeed and to 
secure approval is quite as apt to lead to lying as is fear of punish- 
ment; that tests of moral knowledge are not good indicators of 
honest behavior; that there is some correlation between honesty and 
intelligence ; and that children handicapped by poor homes and poor 
environment are generally less truthful. Stoke and Lehman (137) 
tested overstatement among college students about books they were 
supposed to have read. Again the dull students were least trust- 
worthy. The writers made the interesting suggestion that this ex- 
aggeration may account for the negative correlation between grades 
and reported hours of study. An anonymous professor (7) found 
46 per cent of his freshmen and 25 per cent of his juniors raising 
the scores on a test which they marked themselves. Newcomb and 
Watson (115) found from 12 per cent to 20 per cent of graduate 
students in education taking similar advantage of the opportunity to 
raise their scores; again there was evidence that those who most 
needed the credit were most apt to use underhand methods for 
getting it. Campbell and Koch (23) found more cheating among 
100 students who had just heard two lectures on honesty than among 
acontrol group. Forty-four per cent of students from high schools 
with honor systems cheated, whereas 31 per cent was the average 
for those without such training. 

Outside of the field of honesty few new conduct tests were de- 
scribed. Kendrew (80) tested the strength of motives in children 
three to six by their ability and willingness to pile up dominoes when 
offered food, satisfaction of curiosity, competition with others, and 
competition with their own record, In general the incentives were 
much alike, but the slight differences placed them in the order just 
stated. Distractions in most cases seemed to improve rather than to 
decrease interest and effort. Leuba (98) found pupils whose per- 
formance of a task had reached a plateau, responding by 50 per cent 
to 60 per cent improvement in response to rivalry, praise, social 
recognition, and chocolate bars. In general those originally abler 
responded with larger increases. Boys, having done poorer work 
than the girls under no-incentive conditions, made larger gains. 
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Lewin and Freund (99) used a measure of persistence with simple 
laboratory tasks carried out according to the inclination of 12 
female subjects. Their results pointed to faster work during the 
menstrual period, but to less inclination to carry out these activities 
and less persistence in them at this time. 


II. BEHAVIOR OBSERVATION 


Goodenough (55) and her co-workers observed the behavior of 
33 nursery school children and developed a method for describing 
units of behavior that could be quantitatively compared. Twenty- 
five one-minute observations were made on each child. The relia- 
bility of observations on compliance was .5, on talkativeness 6, 
laughter, sociability, leadership and physical activity each about .8. 
Intercorrelations among these observations and with data from 
rating scales and personal histories indicated that sociability, lead- 
ership, and talkativeness are rather closely interrelated ; intercorrela- 
tions generally above .6. Height, weight, chronological age, and 
mental age are positively related to these leadership factors. It is 
interesting to observe that teachers’ ratings on individual beauty and 
attractiveness of personality showed correlations of .5 or .6 with 
Merrill-Palmer mental age. Physiological tests, health measures, 
position in the family, size and status of the family, the child’s sex, 
seemed to be unrelated to any of the measured behaviors. 

Herring (66) pursued a very careful experiment over several 
months in observing the response of nursery school children to taste 
stimuli. The observational procedures were refined until agreement 
between one series of observations and another reached correlations 
above .9. The experiment in general showed a decrease in extremes 
of behavior, both liking and disliking, tending toward neutrality 
as the stimuli were repeated. 

Newcomb’s (114) observation in camps, more fully reported pre- 
viously, was briefly reviewed. He found no consistency in the actual 
behaviors usually classified by trait names or in such types as 
introversion-extraversion. Farmer (44) observed more than 600 
boy apprentices and described five types of behavior. The correla- 
tion between the type rating and industrial proficiency measured by 
a practical test was .21. This was somewhat inferior to a battery of 
objective psychological test. 
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III. CHARACTEROLOGICAL INDICES 


Most of the characterological work comes to us from Germany. 
An outstanding contribution is Roemer’s (129-132) series of four 
articles. He began with the Rorshach pictures, using a stenographic 
report, behavior description and detailed timing. He added the fea- 
ture of asking the subject at the close of the test to sketch the figures 
he had seen in the test blot. Drawings appeared to Roemer to depend 
very little upon training in drawing, but a great deal upon the 
liveliness of interpretation. He found students in teacher training 
institutes dull and pedantic in contrast with the humanistic Gym- 
nasium group. A further test in this series was made up of pictures 
from which the subjects were asked to choose those they liked, 
scenes in which they would like to be present, people whom they 
would like to be, types of girls they found attractive. The Gym- 
nasium students were very critical, the vocational school boys being 
attracted to many persons and situations which the other class re- 
jected. A follow-up of a few subjects six years later showed that 
their personalities retained many of the same characteristics posited 
on the basis of the earlier tests. Case studies show how the symbolic 
interpretations lead the girl suffering from insomnia, the girl 
secretly married, the dentist, the homosexual, the repressed student, 
to “ give themselves away” because they did not realize that they 
were reading into the picture their own mental states. One in- 
teresting report showed a variation in the same subjects at the 
beginning, middle, and end of a long mountain climb. In the final 
article Roemer describes an apparatus for recording breathing 
graphically in a natural fashion and over a long period of time. He 
shows the close correlation between breathing curves and hesitations, 
repressions, anxiety, and exclamatory feelings during the picture 
interpretations. 

Beck (11) used the Rorshach tests on 69 children at Randall’s 
Island. Four cases are presented showing clear-cut differences be- 
tween one who is feebleminded and one of superior intelligence, 
between a psychotic and a behavior problem. Proportions of 
“movement ” and “color” interpretations are taken to be diagnostic 
of “ emotional lability,” “ interiority ” and “ exteriority.” 

Krauss (87) used symbolic drawings in which subjects were 
asked to describe by line drawings how they felt when happy, 
furious, sad, or longing; how they would express the feeling of 
iron, gold, glass, wood. These were not representative drawings, of 
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course. Many similarities appeared among different persons in the 
representation of their emotional states. “ Rage” was characterized 
by heavy, angular lines; “longing” by soft lines curving upward, 
and so on. Subjects who did not participate in the original experi- 
ment correctly matched the line drawings and the named emotions in 
more than 70 per cent of the cases. Struve (142) tested more than 
200 adolescent boys and girls with vague pictures like the Rorshach 
gray blots, an interrupted story to be completed, and a story com- 
posed around two stimulus words. The subjects were classified as 
those who were merely enumerated, those who expressed an intuitive 
type of imagination in which complex actions appeared spon- 
taneously, and those with fantastic or rational inventions. More than 
80 per cent of the children were consistent in the several tests, and 
the types agreed even more closely with instructors’ ratings. Boys 
were in general more inventive than girls. 

Wolff (167) tried to separate out from the total personality cer- 
tain definable indices. In one experiment subjects spoke a sentence 
onto a phonograph record and also wrote the same sentence. The 
proportion of correct matching by judge was five to two. Per- 
sonality descriptions matched with handwriting were correct in 41 
cases where chance would have provided 26. A photograph profile 
matched with handwriting showed 24 correct with 20 likely to be 
correct by chance. Lembke (95) compared the drawings of 17 
bold pupils with those of 17 shy pupils and found the latter to 
use sharper delineation and brighter colors. Ziehen (169) reviewed 
the whole field of characterology, suggesting research methods and 
techniques in character diagnosis. 

Association tests are the characterological approaches most com- 
mon in the United States. Meltzer (112) asked 132 students to 
describe the experiences of their Christmas vacation, and found the 
average student remembering eleven pleasant experiences and seven 
unpleasant ones. Six weeks later 60 per cent of the unpleasant and 
43 per cent of the pleasant experiences had been forgotten. The 
individual differences in forgetting led him to classify his group into 
optimists, pessimists. and indifferentists. Estabrooks (39) mace an 
important contribution to technique by showing that a suggestion of 
sex responses in advance of a free association appeared not markedly 
to increase the proportion of such associations. Good (54) used 
the number of words furnished by a patient in three minutes not only 
to show mental level but also to show repressions and, when the 
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words were classified as male, female, or neuter, to indicate the sex 
of the dominant parent. Noh and Guilford (118) asked 100 college 
men and women to write 100 words as rapidly as possible. When 
these words were classified, they indicated that the men used more 
verbs, more abstract forms, more words dealing with implements 
and occupations; whereas the women more often referred to words 
describing clothing, buildings, education, art, kinship, and music. 
There were no great differences in speed of writing or in per cent of 
unique associations. Kent and Wells (81) standardized story com- 
pletion tests on over 400 children, emphasizing particularly the in- 
tellectual measure, but recognizing the possibility that such tests 
might reveal something of personality. Several studies of graphology 
were presented (20, 30, 75), the first one analyzing in particular the 
third-dimensional factor of pressure. Two studies (48, 77), demon- 
strated, as is usual, that character indices based on skull measure- 
ments and judgments of teaching ability based on photographs are 
worthless. Paterson (122b) summarized the matter in these words, 
“With the possible exception of physical factors associated with 
temperamental characteristics and of disease processes involving the 
higher centers of the central nervous system, our survey has demon- 
strated that prevalent notions regarding an intimate relation between 
bodily traits and mental development (personality included) have 
been greatly exaggerated.” 


IV. PrystoLocicaL INDICES 


The psycho-galvanic reflex has claimed the lion’s share of atten- 
tion in this field. Thouless (148) presented careful experiments 
with methods designed to distinguish between the PGR and the 
phenomen of Tarchinoff, the latter being the change in current 
passing between two electrodes on the skin when no external elec- 
trical current is introduced. He is inclined to believe that these 
phenomena are physically distinct and independent, although both 
are usually combined in less careful determinations. Landis (89) 
and Wang (156) both reviewed previous studies and came to the 
conclusion that the psychogalvanic reflex is one of a complex of 
autonomic reflexes under the control of the sympathetic division and 
associated with such other changes as sweat secretion, vasomotor 
changes, and muscular responses. Both are inclined to doubt the 
wisdom of interpreting PGR changes as indicative of emotional 
reactions. Phenomena appear with or without conscious emotional 
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concomitants. Davis (32) showed that the size of the deflection de- 
pended upon the size of the measuring current, and found that the 
stronger stimulus gave a shorter latent period. Abel(1) found 
fifteen times as many PGR deviations in response to true-false 
problems which subjects reported as hard, as occurred with the 
presentation of easy problems. Patterson (124) found surprise one 
of the emotional states that might be accompanied by the reflex, but 
agreed with other studies in finding that the form of the reflex 
response did not indicate the nature of the feeling. Odegaard (120) 
found 344 psychiatric patients giving more unstable, irregular, and on 
the whole less active reflexes than were gvien by 182 normals. The 
more serious conditions showed more atypical reactions. Organic and 
schizophrenic psychoses showed large decreases in_ reactivity. 
Jones (78) applied the PGR to eight babies, three to eleven months 
of age, and found the curves showing the same characteristics that 
have been reported in adults. Startle and frustration were the most 
effective stimuli. Initial resistance was lower than with adults, and 
did not rise during sleep. There was suggestion that the children 
who were most active in emotional expression, who cried most easily, 
showed the least resistance on the galvanometer. Estabrooks (40) 
found that the body resistance of adults increased during hypnosis, 
as it has previously been reported to do during sleep. This response 
was rather quickly conditioned to the presence of the operator, and 
appeared withoyt development of anything like the hypnotic trance. 

Rackley (125) used both the psychogalvanic reflex and the blood 
pressure measures, discovering a low but positive relation between 
them. For his ten subjects, fear-producing stimuli caused larger 
changes than were produced by mental work. Evans (41) reported 
galvanic, heart and breathing changes but found no significant cor- 
relations with intelligence or academic grades. Body resistance 
showed a correlation of —.42 with intelligence aniong 50 college 
students. Sccott (133) showed a motion picture to 100 men and 
recorded systolic blood pressure. Sex emotion aroused by a dancing 
girl brought about a rise in systolic pressure, but such stimuli as a 


flogging scene and the cataclysmic destruction of a city showed no 
characteristic response. Skaggs (135) found pulse rate least in 
the relaxed condition, increasing during the expectation of a shock, 
greatest following an unexpected loud noise. Breathing was most 
shallow during mental multiplication, increased in amplitude during 
relaxation, still further during anticipation of a shock, and was at 
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its peak in the relief following the shock. Furukawa (50) continued 
his studies on blood groups, using in this case self ratings on 11 
temperamental characteristics. He is inclined to believe that persons 
of Blood Group O are phlegmatic; those of Blood Group A, melan- 
cholic ; those of Blood Group B, sanguine. 


V. Lasporatory TEstTs 


Duffy (37) demonstrated that there are interesting possibilities 
in the measurement of muscular tension in grip. Nursery school 
children held a bulb in one hand and used the other to make a 
discrimination reaction. The individual differences were constant 
during the eleven days of the study. _A positive correlation existed 
between degree of tension and the teacher’s estimate of tendency to 
excitability. Shape of the line and degree of variation from it 
appeared to be more significant than merely average tension. 
Andrews (5) studied imagination, again with nursery children. A 
tachistoscope presented partial stimuli to cause recall of past ex- 
periences. Number of suggestions and quality of imagination showed 
a high correlation (.87). There was little relationship between 
intelligence or chronological age and the amount of fantastic imagina- 
tion. Previous studies have failed to show any relationship between 
laboratory tests and the ill-defined concept of introversion or extra- 
version. Washburn’s (157) study of 42 college women is no excep- 
tion. Reaction time, flicker sensitiveness, extremes of liking and 
disliking rather than indifference to color or nonsense syllables, 
showed the negligible differences that might have been expected. 
Hull (71) has standardized phonograph records for hypnosis in a 





way that may furnish an objective situation for measuring one form 
of suggestibility. Wollstein (168) presented a battery of 20 
laboratory tests and recorded personality reactions which appeared 
during the testing of two individuals. 


VI. KNOWLEDGE AND Asitity TEsTs 


Social intelligence still calls forth some attention (18, 139). In 
agreement with previous investigators, Strang (139) found that 
correlations with other measures of general intelligence are more 
significant than correlations with success in being a dean of women, 
scores on the Gilliland Sociability Questionnaire, teaching experience, 
or other indices of social ability. 
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Association Press has published a Moral Information Test (160) 
covering knowledge of cause and effect in personal relations, proper 
identification of acts of cheating, lying, and stealing, and extent of 
ethical vocabulary; reliability about .80. 


VII. Attirupes, OPINIONS, AND BELIEFS 


Opinion Ballots A (160) and B (160) developed by the Char- 
acter Education Inquiry record the judgment of children on what 
is their duty; on the best act in certain common situations; on acts 
which should be called right, excusable or wrong ; the truth of ethical 
maxims ; the probability and importance of consequences following 
certain common behaviors; feelings about school, choice of com- 
panions ; typical interests, ideas of the justification for acts usually 
considered wrong ; ideas of success; preference among activities, and 
so on. Reliability of Ballot A is .53 and its correlation with mental 
age .60. Reliability of Ballot B is .81 and its correlation with 
mental age .54. 

Dudycha (34) gave to all freshmen entering Ripon College in 
1929 a questionnaire including moral beliefs, among others. Ninety- 
five per cent believed that one should always pay his debts; 79 per 
cent that one should never make statements which are intentionally 
misleading; 76 per cent that it is wrong to have promiscuous sex 
relations ; 70 per cent that it is wrong for women to use alcohol; 49 
per cent that one should always obey his parents; 39 per cent that 
cribbing in an examination should be reported; 24 per cent that it 
is morally wrong for women to smoke, and 9 per cent that it is 
morally wrong for men to smoke. 

Religious opinions and attitudes are recorded in the Test of 
Religious Thinking (160) which is available in an elementary and 
an advanced form, in five of Thurstone’s scales (150) dealing 
with attitudes toward God, and again in the questionnaire used by 
Dudycha (35). This last named showed a reliability of .93 between 


statements presented in one form and the negative form, and showed 
Ripon College students accepting about 60 per cent of the orthodox 
doctrines. 


Most of the attitude and opinion scales deal with social questions. 
Three parallel forms of the test for measuring ethical outlook upon 
economic questions have been prepared by Schultz (160). Inter- 
national attitudes are measured by Harper’s three-hour test (160), 
the Test of Opinions on International Questions (160), Neumann's 
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test (116), and the Droba Scale of Attitudes Toward War (33). 
Race Attitudes are measured in the test of Race Attitudes (160), 
the Hinckley Scale for measuring Attitude toward the Negro (69), 
and in five attitude tests presented in Lasker’s book (91) and which 
are published by The Inquiry, 129 East 52 Street, New York City. 
Willoughby (164) sampled student opinion at Stanford University 
and found about 99 per cent satisfaction with the college, 50 to 60 
per cent favoring recognition of Russia, 20 to 40 per cent planning 
to enter business rather than a profession, 70 per cent of the men 
and 90 per cent of the women favoring a single moral standard, etc. 
There was no very clear influence of college experience upon such 
attitudes. Kornhauser (83) tested students before and after a course 
in economics, finding improvement unrelated to intelligence, but 
varying inversely with original score. Apparently the students be- 
came more scientific and more liberal in their answers. Lock- 
hart (101) found that 3,500 children agreed with his ideas about 
law enforcement about as well as did adults. Blanchard and 
Manasses (15) kept up the Stanley Hall tradition by a book based 
upon 252 answers to questionnaires given adolescent girls and 
covering topics of sex adjustment, inferiority feeling, ideals, voca- 
tional plans, and so on. Thurstone (149) reported a scale for 
measuring attitude toward the movies. The University of Chicago 
Press lists (150), in addition to those already published, a scale of 
Attitude toward Birth Control, and 26 other scales planned or in 
preparation. Three studies (34,53,102) based on superstition 
among high school or college students indicated that the common 
expressions regarding luck are not often believed. Among the 200 
statements submitted by Lundeen and Caldwell (102) to 900 high 
school students, the only statement believed by more than one-half 
of the group was that the winters are less severe than they used to 
be thirty or forty years ago. About one-third of the group con- 
curred in the next most popular statements, which dealt with indi- 
cation of a heavy winter by birds’ plumage, extra supplies of honey, 
heavy coats of fur, or unusual accumulations of nuts by squirrels. 
In this as in previous studies girls are slightly more credulous than 
boys, persons from small towns more so than people from cities, high 
school students more so than college students; although Lundeen 
and Caldwell found no correlation between superstition and age. 
Lentz (96) proposed an opinion test score to indicate conserva- 
tion, acquiescence, and variability, with reliabilities of about .70. 
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Weinland (163) discussed the use of proverbs as a possible test for 
differentiating male and female attitudes, conformists and variants, 


VIII. INTERESTs 


There are five commonly accepted methods of measuring in- 
terest. The most common and least valuable is probably some 
instrument which asks people what they are interested in. There is 
something of an improvement in the measures which get at interests 
indirectly through a series of likes and dislikes, the purport of which 
is not obvious to the person answering. Report by an individual of 
what he actually does is sometimes used as an indication of interest, 
The fourth type of measure is an information test based on the as- 
sumption that those who know most in a given area must have the 
most interest in it. Actual observation of the individual’s behavior 
is the last and best measure. Each of these types was represented 
by some studies during 1930, and these will be presented in the 
reverse order from that suggested above. 

Hulson (72) made detailed consecutive observations on ten four- 
year-old children half an hour on every school day throughout the 
year. Blocks were most often chosen, used for longest periods of 
time, and most apt to be used by several children. Sand table and 
house corner also stood high. Dolls, animals, and blackboards rank 
low on these measures. The average length of time spent on an 
activity varied in the case of different children from seven to twenty- 
seven minutes. Ehrle (38) observed 100 children two to seven years 
of age, and on the basis of their behavior classified their predominant 
values as sense-pleasure, egoism, economic, or social. About 20 per 
cent he found to be a pure type, strongly influenced by one value 
only. Waples (156) compared the reading interests of adults as 
marked on a questionnaire with actual reading, and found wide dis- 
crepancies. The questionnaire reports revealed interest differences 
along social, geographical and educational lines. 

The only information test for measuring interests was described 
by McHale (107) as given to 133 Goucher College students. The 
test involved familiarity with such items as the Seguin form board, 
the findex system, pruning, the Hippocratic oath, and a codicil. Only 

24 per cent were following two years later the vocation they chose 

while a college junior. Among those who had done so there was 
a correlation of about .7 between the test results and success as 
estimated on a five-step scale by the employer. 
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Time schedules may be used as interest measures, and were so 
used by Coy (29) in relation to thirty gifted children and by 
Andrews (6) with students at the North Carolina College for 
Women. Coy found no constant difference in time schedule between 
the bright and the extremely bright. Andrews found freshmen 
spending much more time on curricular than on extra-curricular 
life, but that for seniors the proportions were about even. Stoke 
and West (138) asked 36 observers to record on a check list 
topics which they heard discussed in free conversation among 
students. Sex and dates made up the largest category. The total 
percentage of topics classifiable as artistic or intellectual was only 
about 15 per cent; as impersonal social comment, one-tenth of 1 
per cent. These observations led the authors to reflect sceptically 
about the value of the college dormitory bull session. Lehman and 
Witty (92-94) are still squeezing drops of juice from their play 
questionnaire. The studies published this year suggest that boys 
carry On more motor activities than do girls; that accelerated chil- 
dren read more than retarded children; that retarded children select 
more play activities of a religious nature; that pubescence is 
accompanied by loss of interest in childhood activities. 

The most significant measure developed during the year in the 
field of interest is probably Garretson’s questionnaire (51) for dif- 
ferentiating technical, commercial and academic inclinations of ninth- 
grade boys. Reliabilities for the scores were about .9, and bi-serial 
r’s between the test and enrolment or non-enrolment of the pupil 
of above average success as indicated by school marks was .9 for 
the technical group, .7 for the commercial and .6 for academic. 
Correlations with objective measures of mechanical, clerical and 
academic ability were close to zero. Symonds (145) made the in- 
teresting point that these interest questionnaires show the type of 
curriculum into which the individual should go, whereas the objec- 
tive tests of ability indicate the degree of success he will probably 
obtain in it. Langlie (90) confirmed the usual understanding that 
pupils are more apt to get their best grades in subjects they like 
best. The Strong Vocational Interest Test is being widely used and 
was discussed in two articles (140,141), one of which presents the 
formula by which “ like,” “ dislike,” and “ indifferent ” were weighted 
to yield reliability coefficients between .75 and .90. Among Stan- 
ford University seniors two years after testing, 50 per cent were in 
occupations in which they had rated highest on the test, and 71 per 
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cent were in occupations in which they had rated first or second, 
A like-dislike test was used in the new Minnesota series of mechani- 
cal ability tests. Morris (113) described in an article, her disserta- 
tion, previously published, a test of likes, of knowledge of tactful 
response, of the best reaction to teaching situations, of feelings about 
such situations, etc., which correlated .46 with practice teaching 
grades when intelligence and academic averages were constant, 
Ullman (153) found teaching success closely related to grades in 
practice teaching and to socio-economic status, but little related to 
self-ratings, knowledge of objectives and principles of education, or 
teaching interests on the Strong blank. 

An interest interview with a blank for guidance was used with a 
great deal of success in counseling by McFarland and Son- 
quist (105). Five studies (3, 79, 128,143,156) asked children or 
adults about their favorite books, work, recreation, companions, pur- 
chases, movie actress, type of moving picture, etc. Antipoff (8) 
compared results with investigators in other countries. Moving 
picture interest studies appeared consistent in reporting preference 
for mystery, comedies and active Western pictures. Interest in love 
themes increased with grade advancement among high school 
children. 


IX. SELF-DESCRIPTION 


Very intriguing devices are being developed to record with some 
subtlety the opinion which an individual holds of himself. Most of 
the instruments are dependent upon frank cooperation. Not all of the 
reports give adequate recognition to this fact. The validity of any 
self-report measure lies in the conditions under which it is used quite 
as truly as in the blank itself. Thus in the S-A Test (160) some 25 
per cent of the pupils in an average grade school will report un- 
warranted virtues for themselves so numerous as to be beyond three 
S.D.’s from the mean of an honest group. Maller (108) reported 
that papers upon which children have signed their names show more 
codéperation for the class, and less tendency to give themselves de- 
sirable ratings. Flory (46) found a correlation of .5 between self- 
rating and average ratings by several friends on a scale of 25 traits 
related to teaching success. 

In some of the tests the scoring has been disguised or compli- 
cated so that the subject can hardly distort his score by foresight of 
what will stand to his credit. Best of these, perhaps, is the (Sweet) 
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Personal Attitudes Test for Boys (160), which appears to be a 
three column checking of likes and dislikes, but which yields scores 
related to self-criticism, criticism of others, feelings of difference, 
sense of superiority, inferiority, social insight, and deviation from 
the group idea of the right. Reliabilities by split-half methods 
ranged from .76 to .94 for these scores. Most useful for personal 
counselling is the Rogest Test for Diagnosis of Emotional Malad- 
justment in Children Nine to Twelve (160), yielding diagnostic 
scores for personal inferiority, social inferiority, family relationship 
problems, and day-dreaming. The Burdick Apperception Test (160) 
is offered as a measure of cultural background in the home, well 
validated in the course of the studies made by the Character Educa- 
tion Inquiry. It is related to the Sims Score Card (r=.51) to 
occupational status (r=.48), to intelligence (r=.48) and especially 
to home ratings by case visitors (r= .66). 

Faterson (45) reported a scale of 94 self-rating items for 
measuring inferiority attitude, with a reliability among college 
students after six weeks of .73. Women showed more inferiority 
feeling than men (5 P.E. diff.), men in education or dentistry showed 
more than men in business or engineering, women in music or medi- 
cine showed more than women in home economics or nursing. A 
“worries scale” containing many of the same items in a different 
form showed a correlation of about .5 with inferiority feeling. In- 
feriority showed a correlation of about the same magnitude with 
the Woodworth Personal Data Sheet, and of about .3 with the 
Heidbreder Introversion Scale. In the same study an interest test 
was created having a reliability of .6 or .7 and a correlation with 
the inferiority measure when applied to a new group of .4. Correla- 
tion of inferiority feeling with intelligence and college marks was 
negligible although the children of semi-skilled and unskilled laborers 
indicated much more inferiority feeling than did the children of the 
managerial group. 

Watson (158) found that self-estimates of happiness were con- 
sistently reported (r, split halves, .84) by graduate students and as 
a result of that study and another by Sailer, still unpublished, a re- 
vised test for recording happiness rating is now available (160). 
The original study showed happiness bearing no relation to intelli- 
gence or academic achievements, but significant relationship to sex 
adjustment, self-confidence, absence of sensitiveness, superior health, 
satisfaction with work, etc. Fairchild (42-43) estimated from case- 
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interviews the happines of men in the metal trades and found correla- 
tions of .5 or .6 with skill measures, skill appearing more signifi- 
cant than wages or hours in bringing satisfaction. Hall (59) likewise 
measured job-satisfaction, this study being in terms of reaction to 
the disagreeable features of the work. Cason’s Annoyance Test (24) 
gives a record of the proportion experiences found irritating. 

Thurstone’s Personality Scale (151) is made up of 223 questions 
compiled from previous lists used by Woodworth, Laird, Freyd and 
Allport.* Reliability is reported as .95. On every question more of 
the most neurotic 7 per cent responded unfavorably than so re- 
sponded among the least neurotic 7 per cent, the extreme groups 
having been selected by total scores. Women showed 44 symptoms 
to the average man’s 37. Slightly more symptoms were reported by 
non-fraternity than by fraternity students, more by Jews than by 
Gentiles, more by good than by poor mark-getters, although correla- 
tion with intelligence was .04. MHeidbreder published a Personal 
Traits Rating Scale (64) and a study (65) showing general agree- 
ment between traits approved for self and for others. Sex dif- 
ferences pointed to women being more apt to prefer introversion, 
and to recognize inferiority feeling in themselves. Richmond's 
Psychotic Questionnaire is published (127). Weber (161) discusses 
an emotional age scale correlation .4 with M.A. and .5 with C.A. 
Watson (159) offers an wunstandardized scale for rating and 
diagnosing home discipline. Leonard (97) reports a questionnaire 
with over 800 responses from girls dealing with what they tell their 
mothers. 

Symonds (144-146) with Jackson (146) made an inclusive ap- 
proach to the discovery of maladjusted pupils, using an autobiography, 
a group intelligence test, a questionnaire of 175 yes-no items covering 
the pupil’s own attitudes toward school-work, teachers, pupils, home, 
personal affairs, etc., an identification sheet of the “Guess Who” 
type filled out by fellow pupils, and a reputation sheet for teachers’ 
ratings. Failure in marks continued to surpass any of these as 
indicative of the pupils considered by teachers to be problems. The 
pupil’s own report correlated (biserial) .38 with being a problem, his 


*The Bernreuter Personality Schedule published by Stanford University 
Press combines a measure of neurotic traits, a measure of introversion, of 
ascendance, and a new phase called “adequacy” in one instrument. This useful 
instrument for replacing the separate scales like the Thurstone, Allport, Colgate, 
etc., was not published until 1931 and strictly speaking belongs to next year's 
review. 
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fellow-pupils’ reports gave a similar result (r=.31). The most 
maladjusted pupils were those who reported themselves as dis- 
criminated against in class, wanting more electives, disliking sub- 
jects. disliking teachers, burdened with too much home work, dis- 
liking examinations, etc., and who were reported by fellow pupils as 
grimacing, day dreaming, never volunteering, bullying, pushing, 
making fun of others, etc. The questionnaire presumably catches 
the seclusive, withdrawing type, the rating scale the boisterous at- 
tention-getting type. Keys and Whiteside (82) averaged Wood- 
worth-Cady symptom score and teacher’s rating on emotional 
instability (the correlation between the two measures being .53) and 
found the extreme groups differing in that the stable pupils were 
seven months younger, 18 points higher in I.Q., two years ahead in 
mental age and educational age. A.Q.’s were 100 for each group. 
Asher and Haven (9) compared 594 public school boys with 249 
boys of similar age, twelve to eighteen, in the Kentucky Houses of 
Reform. No significant differences appeared in total scores although 
the delinquents were more apt to report a strong desire to steal, a 
desire to run away from home, need for a light in the room at night, 
truancy, and fear of thunder-storms. The list is very nearly the 
same as that obtained by Slawson some years ago. Evans (41), 
Gilliland (52), and McGeoch (106) add to the evidence that quanti- 
tative scores on emotionality tests have very little significance for 
school success. Beckman and Levine (12) found the Ascendancy- 
Submission Scale and a directions test of some merit in distinguishing 
between meter readers and city manager executives, but introversion 
(Colgate variety) was irrelevant. Extraversion (Neymann-Kohl- 
stedt variety, 117) was characteristic of the bed-ridden group. 


X. REPUTATION 


Hartshorne and May discussed new devices for rating char- 
acter (110) and mentioned particularly the Guess Who Test (160) 
on which pupils in a class rate each other, the Check List (160) 
which gives surprisingly good results with merely a list of adjectives 
upon which the teacher checks those applying to the child, the Con- 
duct Record (160), a blank for recording the extent to which a 
child does approved things, and the Portrait Matching Scale (160), 
a device suggesting ten verbal portraits of children ranging from 
most to least unselfish, to be used as is the Guess Who Test. The 
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Haggerty-Olson-Wickman behavior rating schedules (58) are now 
published, as is Olson’s monograph (122) on the development and 
use of the scales. The Merrill-Palmer standards of physical and 
mental growth (165) include percentile norms for nursery school 
children rated on energy, mental effectiveness, emotional control, 
ease of social adjustment, etc. The Seattle public schools (134) 
devised a report card in which the A, B, C, D, and E pupils were 
carefully described to make in effect, a portrait matching scale, 
Adams (2) presented a clear-cut method for determining how ob- 
jective or subjective any scale may be. In perfectly objective scales, 
consistency among different persons is the same as consistency when 
the same person repeats his observations. In proportion as the 
self-consistency becomes greater than group-consistency the scale be- 
comes subjective, t.e., constant errors appear that are related to the 
judge. This ratio appears worth reporting on any type of measure- 
ment in which the suspicion of subjectivity enters. It is discussed 
here under reputation measures but would be as truly applicable to 
laboratory or behavior observation tests. It may prove as useful 


as reliability figures. 
Reputation measures have been used by Turney (152) to discover 
personality traits related to superior achievement (higher than LQ. 


would suggest) with the usual results: achievers are more punctual, 
more regular, in attendance, rated higher (by the same teachers who 
mark them), in industry, perseverance, dependability, ambition and 
interest in school work. In other words, they are successful in the 
eyes of the teacher. Herriott (67) evinces his faith in statistics by 
using tenth-order partial correlations to determine that the instruc- 
tor’s ratings on perseverance and evaluating attitude are more closely 
related to marks in education than are ratings on cheerfulness. 
Ratings of teachers by pupils are reported by Boardman (16), 
Clem (25), and Light (100). In the last-named study the best of 28 
teachers was ranked first or second by 81 per cent of the pupils 
mentioning him, while the poorest was so ranked by less than 10 
per cent. Boardman found that ratings on teaching efficiency, liking 
for the teacher, ability to make pupils work hard, ability to discipline, 
and amount pupils learn, all measured essentially the same attitude. 
He found agreement between ratings given by pupils, by supervisers, 
and by fellow-teachers in the neighborhood of .6 for each inter- 
correlation. 
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XI. CoMBINATIONS AND BATTERIES 


Hartshorne and May, with Shuttleworth, published during 1930 
the third and final volume of the work of the Character Education 
Inquiry, Studies in the Organization of Character (61). The first 
part of the volume presents the moral knowledge and opinion tests 
mentioned above, shows their intercorrelations to be close to .65, 
reviews some evidence previously published in the monograph “ Test- 
ing the Knowledge of Right and Wrong,” describes Patterson’s 
“Foresight of Consequences ” test, shows relation of moral knowl- 
edge scores to school marks to be about .4, to intelligence .6, to 
emotional stability .3, to honest conduct .39, persistent conduct .24, 
helpful conduct .20 and inhibited or controlled conduct .15. No 
comparisons were available between knowledge about specific forms 
of behavior and performance of those behaviors. All comparisons 
were statistical and in terms of total scores. Relation between all 
knowledge and all conduct tests is shown in Part II to be .12 in one 
population if scores be deviations from class means, or .55 if devia- 
tions are measured from the mean of the population tested, or .84 if 
class means are correlated, illustrating the dependence of the relation- 
ship upon factors common to the group but not similarly related in 
individuals. Intercorrelation among tests is shown to increase regu- 
larly from .0O if specific tests are used and scores represented in 
deviations from the class mean, as broader combinations of tests are 
used and classroom means are correlated, reaching .60 for the 
intercorrelations of class means in all three populations on traits 
made up of many single tests. The average interrelationship within 
a trait and the intercorrelation of traits are about equal, (r=.2) 
suggesting very little value for the trait unit in measuring character. 
Part III, brings together all the available tests into composite 
portraits, judged for all-round character value by 63 judges. On 
this basis total character is best predicted from the conduct 
record (160) (r=.72), next best from the check list (160) (r—.66) 
both of them reputation measures. Other measures correlating above 
.) with total character thus judged were scholastic marks, deport- 
ment marks, the Guess Who Test, Opinion Ballot A combined with 
Opinion Ballot B, and total service score. Chronological age (—.07) 
and the Sims Score Card for Socio-Economic Status (r=.14) were 
least related. This extraordinary preference for reputation measures 
probably reflects the fact that the most vivid items in the portraits 
were not the conduct scores, statistically expressed, but the adjectives 
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used by teachers and pupils in their reports. Part IV discusses 
“ integration,” using the term in a special sense to mean consistency 
or absence of variability in scores on different measures. A pupil 
who was poor in one measure was required to be equally poor in all 
others to be “ well integrated” as the term is here used. The con- 
tributions made to the theory of character are summarized in con- 
cluding chapters, also in two articles (62,63). These summaries 
stress: the theory of specificity; the lack of relation between knowl- 
edge and conduct; the significance of group morale; the lack of 
character significance in age, sex, and health; the correlation of 
about .6 between character and intelligence; the resemblance of 
siblings (r=.5); the impotence of most character building pro- 
grams ; the close relationships among friends (r=.7) ; the importance 
of a unified and integrated environment. 

Hightower (68) working with Starbuck, tested from 300 to 3,000 
pupils using a Biblical Knowledge Test and most of the honesty and 
service tests of the Character Education Inquiry. Correlations of 
Biblical knowledge with conduct were uniformly zero, correlations 
with teachers’ ratings occasionally went up to .2 or .3. Delinquents 
surpassed public school pupils of similar grade in the Bible knowledge 
tests. On the conduct tests the delinquents were superior in one 
comparison, the public school pupils in three, and in four cases the 
difference was not significant. Howells (70) found those more con- 
servative in religious outlook to be less intelligent, more suggestible 
and less willing to stand pain in laboratory tests. 

Terman (22), Witty (166) and Lamson (88) applied some char- 
acter measures to intellectually gifted pupils. Gifted pupils cheat 
less, read more, are elected to more offices, are more apt to choose 


professions, report fewer neurotic symptoms, have normally versatile 


play interests, are less masculine in interest (if boys) or less feminine 
(if girls), are slightly above average in fairmindedness, are rated by 
teachers as well-behaved, persistent, popular, not considered queer or 
conceited or abnormal. About 25 per cent appeared to have mild 
personality problems, 5 per cent to have serious problems. In gen- 
eral girls are rated slightly better than boys, as in all reputation 
studies. Of special interest are the correlations over five years 
reported by Terman. For the Wyman Interest Test these were .3l, 
.37 and .15, corresponding to .81, .74 and .61 over ten days. For 
the Woodworth-Cady Blank reliability was .42 over five years, but 
.75 of a week or two. 
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Vetter applied a battery of tests in the study of social atti- 
tudes (154). He found conservatives more apt to be women, 
prosperous, youngest children, republicans, Gentiles, and below the 
radicals in intelligence. Three studies (27,56,74), fail to 
demonstrate differences between negroes and whites in suggestibility, 
ascendance, or emotional association complexes. Oliver (121) com- 
pared extreme introverts and extraverts on the Colgate test, finding 
no relation to age, intelligence, ascendance, prejudice, social in- 
telligence, but some difference in scholarship, emotional traits, likes 
and dislikes. Bellingrath (13) compared high school leaders in 
extra-curricular life with non-leaders, found boy leaders more apt 
to be rated as interested in continuing study, girl leaders rated as 
superior in neatness, honesty, interest, initiative, ambition, persistence, 
reliability, and stability. Correlation of leadership with intelligence 
was —.14; with scholarship, school habit rating, socio-economic 
status ahd introversion the relationships were all zero. Jersild (76) 
reported intercorrelations among 42 college students on tests of in- 
telligence, social intelligence, emotional symptoms, ascendance, and 
ratings on ascendance, beauty and amiability. Allport test and rating 
on ascendance agree to the extent of a correlation of .5, beauty and 
amiability are similarly related, general and social intelligence give 
a correlation of .4, while the other relationships are considerably 
lower. Berne (14) combined diary observations, rating scales, and 
ten experimental behavior situations involving obedience, cooperation, 
property rights, etc. Few of the correlations with CA (MA con- 
stant) or MA (CA constant) were high. Interest in the group, 
independence, rivalry, socially controlled behavior, responsibility, 
affection, and ascendance showed correlations with MA above .4. 
Correlation between experimental situation tests and conduct ratings 
on the traits involved averaged .76. Kovarsky (84) discusses the 
value of the psychological profile (Rossolimo’s) including tests of 
will and emotion. 


XII. Sevective Resumé 


The outstanding event of the year has been the completion of 
the series of studies by Hartshorne and May with the interrelation- 
ships they present. The publication of their tests is noteworthy. 
The publication of the series of attitude scales following Thurstone’s 
technique is an outstanding contribution. Adams’ objectivity index 
has splendid possibilities in improving test technique. Several new 
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measures of inferiority feeling, of happiness and satisfaction, of the 
presence and nature of problems in emotional adjustment point to 
a very strong present interest. The trend in Germany toward the 
analysis of individual types, manifested in imagination, association, 
speech, and in motor habits, is in marked contrast with the mags 
testing in the United States. Behavior observation is achieving 
splendid results in nursery schools but has hardly entered other fields, 
The pioneer tests: Pressey, Downey, etc., have practically dis- 
appeared from the current studies. Introversion-extraversion is 
slipping rapidly into disrepute. The psychogalvanic reflex remains an 
interesting phenomenon but is less confidently connected with emo- 
tion. There are new inventions, like the Guess Who technique, in 
the field of reputation measures and a greatly increased confidence 
in the significance of reputation. There seems possibly to be an 
over-use of cross-sectional studies with correlational techniques and a 
paucity of experimental before-and-after studies. The amount of 
first class psychological study being given to character and personality 
is showing healthy increase. 
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Subscriptions to Psychological Abstracts. ..... 4,542.00 
pe om Cheesy, ils ceds ds bucscatinccs 7.31 
$8,615.68 
Balance on hand, December 15, 1931: 
Om Checking S6G0GME ....ccccccccccccsces $3,818 .74 
On savings account ............cccccccess 9,302.40 
13,121.14 





$21,736.82 
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Account oF NintH INTERNATIONAL CoNnGREsS DEPOSITED wit 
AMERICAN PSYCHOLOGICAL ASSOCIATION : 


On huni; Deweeay Ta, Whe oo Se. 05k eo $2,216.49 
I as ond nncd bk CE Kk & beset ana + aebbekine «kane 87 .76 
$2,304.25 








LeonarD CARMICHAEL, 
Treasurer, 
Audited and found correct by 


Carroit C. Pratt. 
MicHaet J. ZIGLER. 


Date: December 19, 1931. 
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NOTES AND NEWS 


AccorDING To Scrence—In the winter and spring Professor 
Thomas R. Garth, of the department of psychology at the University 
of Denver, has engaged to give a lecture on “ Race Psychology” at 
various institutions in the Middle West and in the East. His itinerary 
will include the University of lowa, Wellesley College, Smith College, 
the New School of Social Science, New York City, Davidson College, 
North Carolina, and Washington University, St. Louis. 












Education and Changing Society is to be the theme of the Sixth 
World Conference of the New Education Fellowship which will be 
held in Nice, France, next summer. The dates are July 3 
August 12, 1932. Further information may be obtained from 
Frances Fenton Park, secretary, 425 West 123rd Street, New York 
City. 









Tue Tenth International Congress of Psychology will be held m 
Copenhagen from the 22nd to the 27th of August. Reservations may 
be made with the American Express Company. Tours to psycho 
logical institutions in neighboring countries have been planned. 
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